Text to speech - Mint Starter Kit

curl --request POST \ --url https://api.powertokens.ai/v1/audio/speech \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data @- <<EOF { "model": "speech-2.8-hd", "input": "Summarize today's system status in a natural tone.", "voice": "Chinese (Mandarin)_Lyrical_Voice", "response_format": "mp3", "metadata": { "output_format": "url" } } EOF

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

model

enum<string>

required

Available options:

speech-2.8-hd,

speech-2.8-turbo,

speech-2.6-hd,

speech-2.6-turbo,

speech-02-hd,

speech-02-turbo

input

string

required

Input text to synthesize.

voice

string

required

Voice name. Mapped to upstream voice_setting.voice_id.

speed

number

Speaking rate. Mapped to upstream voice_setting.speed.

response_format

string

Target audio encoding format, mapped to upstream audio_setting.format. Common values include mp3, wav, and pcm.

stream_format

string

Streaming trigger field. Any non-empty value enables upstream stream=true; the current gateway only relies on its non-empty semantics.

metadata

object

MiniMax extension container. The only stable field currently documented here is output_format.

Show child attributes

Response

Streaming success. Returns MiniMax SSE payloads. For internal settlement, the gateway reads extra_info.usage_characters as the unified input audio character count.

SSE payload. Each data: event can be parsed as a MiniMax response chunk containing data.audio, trace_id, and base_resp.

Model APIs

Authorizations

Body

Response