Realtime API Overview

MiniCPM-o exposes two real-time full-duplex conversation modes over WebSocket.

API Host

https://minicpmo45.modelbest.cn

Endpoint

wss://host/v1/realtime?mode={video|audio}

Parameter	Required	Description
`mode`	No	`video` by default, or `audio`. It determines the session duration limit and the recommended input modality.

session_id is generated by the server after the connection is established. The format is rt_{timestamp_ms} and the value is returned in the session.created event. Clients do not need to, and should not, pass session_id in the URL.

Modes

Mode	Endpoint Example	Upstream Data	Session Duration	Effective Conversation
Video Full-Duplex	`wss://host/v1/realtime?mode=video`	Audio + video frames	5 minutes	~90 seconds
Audio Full-Duplex	`wss://host/v1/realtime?mode=audio`	Audio only	10 minutes	~8 minutes

Both modes share the same event naming and message structure. The only difference is the recommended input modality:

Video Full-Duplex: input_audio_buffer.append should usually include video_frames.
Audio Full-Duplex: input_audio_buffer.append should not include video_frames; behavior is undefined if it does.

The mode cannot be changed during a session.

Protocol Documents

Video Full-Duplex Protocol — full-duplex conversation with video frames.
Audio Full-Duplex Protocol — audio-only full-duplex conversation.
JSON Schema — machine-readable message schema.

Example Code

For complete client implementations, see the full-duplex demo pages in this repository. They use the Realtime API protocol directly:

Page	Description
`static/omni/`	Video Full-Duplex — real-time audio/video conversation
`static/audio-duplex/`	Audio Full-Duplex — real-time audio-only conversation

Core client wrapper: static/duplex/lib/realtime-session.js

Repository: https://github.com/OpenBMB/MiniCPM-o-Demo/tree/realtime-protocol