Skip to main content
POST
/
v1
/
audio
/
speech
curl --request POST \
  --url https://api.powertokens.ai/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "qwen3-tts-instruct-flash",
  "input": "Introduce this product with a slightly faster speaking rate.",
  "voice": "Cherry",
  "instructions": "Speak faster with a brighter tone.",
  "optimize_instructions": false,
  "language_type": "English"
}
'
"<string>"

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model
enum<string>
required
Available options:
qwen3-tts-instruct-flash
input
string
required

Input text to synthesize.

voice
string
required

Voice name.

instructions
string

Style-control instruction.

optimize_instructions
boolean

Whether to optimize instructions. An explicit false is preserved.

language_type
string

Language type.

stream_format
string

Streaming output format. Any non-empty value enables streaming on the unified endpoint; pcm is the common Ali value.

Response

Streaming success. Returns Ali SSE payloads. For internal settlement, the gateway reads usage.characters as the unified input audio character count.

The response is of type string.