Text to speech - Mint Starter Kit

curl --request POST \ --url https://api.powertokens.ai/v1/audio/speech \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "model": "qwen3-tts-instruct-flash", "input": "Introduce this product with a slightly faster speaking rate.", "voice": "Cherry", "instructions": "Speak faster with a brighter tone.", "optimize_instructions": false, "language_type": "English" } '

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

model

enum<string>

required

Available options:

qwen3-tts-instruct-flash

input

string

required

Input text to synthesize.

voice

string

required

Voice name.

instructions

string

Style-control instruction.

optimize_instructions

boolean

Whether to optimize instructions. An explicit false is preserved.

language_type

string

Language type.

stream_format

string

Streaming output format. Any non-empty value enables streaming on the unified endpoint; pcm is the common Ali value.

Response

Streaming success. Returns Ali SSE payloads. For internal settlement, the gateway reads usage.characters as the unified input audio character count.

The response is of type string.

Model APIs

Authorizations

Body

Response