MiniCPM-o Docs
Realtime API

Realtime API Overview

MiniCPM-o Realtime API protocol overview

MiniCPM-o exposes two real-time full-duplex conversation modes over WebSocket.

API Host

https://minicpmo45.modelbest.cn

Endpoint

wss://host/v1/realtime?mode={video|audio}
ParameterRequiredDescription
modeNovideo by default, or audio. It determines the session duration limit and the recommended input modality.

session_id is generated by the server after the connection is established. The format is rt_{timestamp_ms} and the value is returned in the session.created event. Clients do not need to, and should not, pass session_id in the URL.

Modes

ModeEndpoint ExampleUpstream DataSession DurationEffective Conversation
Video Full-Duplexwss://host/v1/realtime?mode=videoAudio + video frames5 minutes~90 seconds
Audio Full-Duplexwss://host/v1/realtime?mode=audioAudio only10 minutes~8 minutes

Both modes share the same event naming and message structure. The only difference is the recommended input modality:

  • Video Full-Duplex: input_audio_buffer.append should usually include video_frames.
  • Audio Full-Duplex: input_audio_buffer.append should not include video_frames; behavior is undefined if it does.

The mode cannot be changed during a session.

Protocol Documents

Example Code

For complete client implementations, see the full-duplex demo pages in this repository. They use the Realtime API protocol directly:

PageDescription
static/omni/Video Full-Duplex — real-time audio/video conversation
static/audio-duplex/Audio Full-Duplex — real-time audio-only conversation

Core client wrapper: static/duplex/lib/realtime-session.js

Repository: https://github.com/OpenBMB/MiniCPM-o-Demo/tree/realtime-protocol

On this page