Skip to main content
POST
/
v1
/
audio
/
speech
curl --request POST \
  --url https://api.powertokens.ai/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data @- <<EOF
{
  "model": "speech-2.8-hd",
  "input": "Summarize today's system status in a natural tone.",
  "voice": "Chinese (Mandarin)_Lyrical_Voice",
  "response_format": "mp3",
  "metadata": {
    "output_format": "url"
  }
}
EOF
"<string>"

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model
enum<string>
required
Available options:
speech-2.8-hd,
speech-2.8-turbo,
speech-2.6-hd,
speech-2.6-turbo,
speech-02-hd,
speech-02-turbo
input
string
required

Input text to synthesize.

voice
string
required

Voice name. Mapped to upstream voice_setting.voice_id.

speed
number

Speaking rate. Mapped to upstream voice_setting.speed.

response_format
string

Target audio encoding format, mapped to upstream audio_setting.format. Common values include mp3, wav, and pcm.

stream_format
string

Streaming trigger field. Any non-empty value enables upstream stream=true; the current gateway only relies on its non-empty semantics.

metadata
object

MiniMax extension container. The only stable field currently documented here is output_format.

Response

Streaming success. Returns MiniMax SSE payloads. For internal settlement, the gateway reads extra_info.usage_characters as the unified input audio character count.

SSE payload. Each data: event can be parsed as a MiniMax response chunk containing data.audio, trace_id, and base_resp.