Voice AI Platform

Complete API & platform documentation for the multilingual Text-to-Speech REST API.

Platform Overview

Voice AI is a production-grade TTS API supporting 14 neural voices across 7 language variants (Tamil, Hindi, Telugu, Malayalam, English IN/US/UK). It delivers sub-second cached responses, per-key rate limiting, monthly character quotas, and full request-level audit logging.

Neural Voices

Language Variants

<3ms

Cached Response

GPU Required

48kbps

MP3 Output

Architecture

Client ──HTTPS──▶ Nginx Ingress (TLS + rate limit 50rps) │ ▼ FastAPI Pod (uvicorn, 2 async workers) │ │ │ ▼ ▼ ▼ API Key Rate Limit TTS Engine Lookup Check (edge-tts) (PostgreSQL) (Redis) │ ▼ Microsoft Edge Neural TTS (WSS) │ ┌────────────────────┘ ▼ Redis Cache ──▶ Return audio/mpeg (1h TTL, SHA-256 keyed) │ ▼ Usage Logging (PostgreSQL) ├─ usage_logs (per-request) ├─ daily_usage (aggregated) └─ api_keys (counters updated)

Supported Voices & Languages

Language	Code	Female Voice	Male Voice	Sample Text
Tamil	`ta-IN`	ta-IN-PallaviNeural	ta-IN-ValluvarNeural	வணக்கம், இது ஒரு குரல் சோதனை.
Hindi	`hi-IN`	hi-IN-SwaraNeural	hi-IN-MadhurNeural	नमस्ते, यह एक आवाज़ परीक्षण है।
Telugu	`te-IN`	te-IN-ShrutiNeural	te-IN-MohanNeural	నమస్కారం, ఇది ఒక వాయిస్ టెస్ట్.
Malayalam	`ml-IN`	ml-IN-SobhanaNeural	ml-IN-MidhunNeural	നമസ്കാരം, ഇതൊരു ശബ്ദ പരിശോധനയാണ്.
English (IN)	`en-IN`	en-IN-NeerjaNeural	en-IN-PrabhatNeural	Hello, this is a voice test.
English (US)	`en-US`	en-US-JennyNeural	en-US-GuyNeural	Hello, this is a voice test.
English (UK)	`en-GB`	en-GB-SoniaNeural	en-GB-RyanNeural	Hello, this is a voice test.

Authentication

All API endpoints (except /health and /api/v1/voices) require an API key passed via the X-API-Key header.

curl -H "X-API-Key: vai_your_key_here" https://voice.swaa.life/api/v1/usage/quota

Authentication Flow

X-API-Key header

▶

SHA-256 hash

▶

DB lookup by hash

▶

Check active & expiry

▶

Proceed

For admin endpoints, an additional check ensures is_admin = true. Non-admin keys receive 403 Forbidden.

Key Format

Keys follow the format vai_<32 hex characters> (36 characters total). Example:

vai_fb0ee9a84bb95cc5673dc0adaaff6ac2
│   └────────── 32 hex chars (secrets.token_hex(16)) ──────────┘
└── prefix

Note: The raw key is only shown at creation time. Only the SHA-256 hash is stored server-side. Lost keys cannot be recovered — create a new one and revoke the old one.

POST /api/v1/tts

POST /api/v1/tts Synthesize text to speech

Returns audio/mpeg binary (MP3, 48kbps mono). Identical requests are served from Redis cache (~1-3ms) for 1 hour.

Request Headers

Header	Required	Description
X-API-Key	Yes	Your API key
Content-Type	Yes	application/json

Request Body

Field	Type	Required	Default	Description
text	string	Yes	—	Text to synthesize (1–5000 chars)
voice	string	Yes	—	Voice ID (e.g., `ta-IN-PallaviNeural`)
rate	string	No	+0%	Speed: `-50%` to `+50%`
pitch	string	No	+0Hz	Pitch: `-20Hz` to `+20Hz`

Example Request

curl -X POST https://voice.swaa.life/api/v1/tts \
  -H "X-API-Key: vai_your_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "வணக்கம் உலகம்", "voice": "ta-IN-PallaviNeural"}' \
  --output speech.mp3

Response: `200 OK`

Content-Type: audio/mpeg — MP3 audio binary

Response Headers

Header	Example	Description
X-Processing-Time-Ms	1234	Server processing time (ms)
X-Chars-Processed	13	Input text character count
X-Audio-Bytes	8640	Audio file size (bytes)
X-Audio-Duration-Ms	1440	Estimated audio duration (ms)
X-Cache-Hit	false	`true` if served from Redis

Errors

Code	Condition
400	Invalid voice ID, text too long, or malformed JSON
401	Missing, invalid, expired, or revoked API key
429	Rate limit exceeded or monthly quota exceeded
500	TTS engine failure

POST /api/v1/tts/stream

POST /api/v1/tts/stream Streaming TTS for real-time playback

Same request body as /api/v1/tts. Returns chunked audio/mpeg stream — audio chunks are sent as they are generated, allowing playback to begin before the full audio is ready. Ideal for long texts.

Example

curl -X POST https://voice.swaa.life/api/v1/tts/stream \
  -H "X-API-Key: vai_your_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Long text here...", "voice": "en-US-JennyNeural"}' \
  --output stream.mp3

Note: Streaming responses bypass the Redis cache. Each streaming request hits the Microsoft TTS API.

GET /api/v1/voices

GET /api/v1/voices List all available voices (no auth required)

Response: `200 OK`

{
  "voices": [
    {
      "id": "ta-IN-PallaviNeural",
      "name": "Pallavi",
      "language": "Tamil",
      "language_code": "ta-IN",
      "gender": "Female",
      "sample_text": "வணக்கம், இது ஒரு குரல் சோதனை."
    }, // ... 14 voices total
  ],
  "total": 14,
  "languages": ["English (India)", "English (UK)", "English (US)", "Hindi", "Malayalam", "Tamil", "Telugu"]
}

GET /api/v1/usage

GET /api/v1/usage Comprehensive usage stats for your key

Query Parameters

Param	Type	Default	Description
days	int	30	Number of days to aggregate

Response: `200 OK`

{
  "total_requests": 142,
  "total_chars": 28500,
  "total_audio_bytes": 4521984,
  "total_audio_duration_ms": 753664,
  "cache_hit_rate": 0.35,
  "avg_response_ms": 1250,
  "period_start": "2026-01-08T00:00:00",
  "period_end": "2026-02-07T18:15:00",
  "by_language": { "ta-IN": 50, "en-US": 92 },
  "by_voice": { "ta-IN-PallaviNeural": 50, "en-US-JennyNeural": 92 },
  "by_status": { "200": 140, "429": 2 },
  "daily": [
    {
      "date": "2026-02-01", "requests": 23, "chars": 4500,
      "audio_bytes": 720000, "cache_hits": 8, "errors": 0,
      "avg_response_ms": 1100
    }
  ]
}

GET /api/v1/usage/logs

GET /api/v1/usage/logs Individual request logs for your key

Query Parameters

Param	Type	Default	Description
limit	int	50	Results per page (max 200)
offset	int	0	Pagination offset

Response: `200 OK`

[{
  "id": 1,
  "endpoint": "/api/v1/tts",
  "method": "POST",
  "voice": "ta-IN-PallaviNeural",
  "language": "ta-IN",
  "chars_processed": 42,
  "audio_bytes": 8640,
  "response_time_ms": 1234,
  "status_code": 200,
  "cache_hit": false,
  "client_ip": "72.61.243.39",
  "created_at": "2026-02-07T18:15:00"
}]

GET /api/v1/usage/quota

GET /api/v1/usage/quota Current quota & lifetime counters

Response: `200 OK` (quota key)

{
  "monthly_char_limit": 1000000,       // 0 = unlimited
  "monthly_chars_used": 245000,
  "monthly_chars_remaining": 755000,    // null if unlimited
  "unlimited": false,
  "quota_resets_at": "2026-03-01T00:00:00",
  "rate_limit": 60,
  "total_requests": 1420,              // lifetime
  "total_chars": 285000,               // lifetime
  "total_audio_bytes": 45219840         // lifetime
}

POST /admin/api/keys

Admin required. All admin endpoints require an API key with is_admin = true. Non-admin keys receive 403.

POST /admin/api/keys Create a new API key

Request Body

Field	Type	Required	Default	Description
name	string	Yes	—	Human-readable name (1–100 chars)
description	string	No	""	Optional description (max 500)
rate_limit	int	No	60	Requests per minute (1–1000)
monthly_char_limit	int	No	0	Monthly character quota (0 = unlimited)
is_admin	bool	No	false	Grant admin privileges
allowed_voices	string	No	null	Comma-separated voice IDs (null = all)

Example

curl -X POST https://voice.swaa.life/admin/api/keys \
  -H "X-API-Key: vai_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Mobile App",
    "description": "Production key for iOS app",
    "rate_limit": 100,
    "monthly_char_limit": 2000000
  }'

Response: `200 OK`

{
  "id": "a1b2c3d4-e5f6-...",
  "name": "Mobile App",
  "description": "Production key for iOS app",
  "key_prefix": "vai_3f8a",
  "is_admin": false,
  "is_active": true,
  "rate_limit": 100,
  "monthly_char_limit": 2000000,
  "monthly_chars_used": 0,
  "total_requests": 0,
  "total_chars": 0,
  "total_audio_bytes": 0,
  "created_at": "2026-02-07T18:00:00",
  "api_key": "vai_3f8a1b2c3d4e5f6a7b8c9d0e1f2a3b4c"  // SHOWN ONLY ONCE
}

Important: The api_key field is only returned at creation time. Store it securely — it cannot be retrieved again.

GET /admin/api/keys

GET /admin/api/keys List all API keys with usage counters

Query Parameters

Param	Type	Default	Description
include_inactive	bool	false	Include revoked keys

Response: `200 OK`

[{
  "id": "619a4341-...",
  "name": "Bootstrap Admin",
  "description": null,
  "key_prefix": "vai_fb0e",
  "is_admin": true,
  "is_active": true,
  "rate_limit": 1000,
  "monthly_char_limit": 0,
  "monthly_chars_used": 0,
  "total_requests": 142,
  "total_chars": 28500,
  "total_audio_bytes": 4521984,
  "created_at": "2026-02-07T10:00:00",
  "last_used_at": "2026-02-07T18:15:00",
  "expires_at": null
}]

DELETE /admin/api/keys/{id}

DELETE /admin/api/keys/{id} Revoke an API key (soft delete)

Sets is_active = false. The key record remains for audit purposes. Authentication fails immediately.

Response: `200 OK`

{ "detail": "API key revoked." }

Errors

Code	Condition
404	Key ID not found

GET /admin/api/keys/{id}/usage

GET /admin/api/keys/{id}/usage Detailed usage for a specific key

Query Parameters

Param	Type	Default	Description
days	int	30	Number of days to query

Response: `200 OK`

{
  "key": {
    "id": "a1b2c3d4-...", "name": "Mobile App",
    "total_requests": 500, "total_chars": 100000,
    "total_audio_bytes": 16000000,
    "monthly_char_limit": 500000, "monthly_chars_used": 45000
  },
  "recent_logs": [{
    "id": 100, "endpoint": "/api/v1/tts",
    "voice": "ta-IN-PallaviNeural", "chars": 42,
    "audio_bytes": 8640, "response_ms": 1234,
    "status": 200, "cache_hit": false,
    "client_ip": "72.61.243.39",
    "user_agent": "Mozilla/5.0...",
    "created_at": "2026-02-07T18:15:00"
  }],
  "daily": [{
    "date": "2026-02-07", "requests": 23, "chars": 4500,
    "audio_bytes": 720000, "cache_hits": 8,
    "errors": 0, "avg_response_ms": 1100
  }]
}

GET /admin/api/stats

GET /admin/api/stats Platform-wide statistics

Query Parameters

Param	Type	Default	Description
days	int	30	Number of days to aggregate

Response: `200 OK`

{
  "total_keys": 5,  "active_keys": 4,
  "total_requests": 1420,  "total_chars": 285000,
  "total_audio_bytes": 45219840,
  "avg_response_ms": 1250,  "cache_hit_rate": 0.35,
  "top_voices": [{ "voice": "ta-IN-PallaviNeural", "requests": 500 }],
  "top_languages": [{ "language": "ta-IN", "requests": 600 }],
  "top_keys": [{ "key_id": "a1b2...", "key_name": "Mobile App", "requests": 500, "chars": 100000 }],
  "daily_trend": [{ "date": "2026-02-01", "requests": 100, "chars": 20000, "errors": 0, "cache_hits": 35 }],
  "requests_today": 45,  "chars_today": 9000,  "errors_today": 0
}

Rate Limits & Quotas

Per-Key Rate Limiting

Each API key has a configurable rate_limit (requests per minute, default 60). Uses a Redis sorted set sliding window algorithm. When exceeded:

HTTP 429 Too Many Requests
Retry-After: 60

{ "detail": "Rate limit exceeded. 60 requests per 60s allowed." }

Monthly Character Quotas

Optional per-key monthly character limit. When monthly_char_limit > 0, each request's text length is counted. Quotas reset automatically on the 1st of each month (UTC). When exceeded:

HTTP 429 Too Many Requests

{
  "detail": "Monthly character quota exceeded.",
  "quota": 1000000,
  "used": 1000000,
  "remaining": 0,
  "resets_at": "2026-03-01T00:00:00"
}

Ingress Rate Limiting

Nginx Ingress applies a separate per-IP rate limit: 50 requests/second with 5x burst (250). This is independent of the per-key API rate limit.

Caching

TTS responses are cached in Redis for 1 hour (configurable). The cache key is:

SHA-256("text|voice|rate|pitch") → tts:cache:<hash>

Identical requests with the same text, voice, rate, and pitch combination return the cached audio in ~1-3ms instead of ~1500ms from the Microsoft TTS API. Cache hits are tracked in usage logs (cache_hit: true).

Streaming bypass: /api/v1/tts/stream does not use the cache. Only /api/v1/tts reads and writes the cache.

Error Codes

Code	Meaning	Detail
400	Bad Request	Invalid voice ID, text exceeds 5000 chars, malformed JSON body
401	Unauthorized	Missing `X-API-Key` header, invalid key, expired key, or revoked key
403	Forbidden	Non-admin key used on admin endpoint
404	Not Found	Invalid key ID in admin endpoint
429	Too Many Requests	Rate limit exceeded (includes `Retry-After` header) or monthly quota exceeded (includes quota details)
500	Server Error	TTS engine failure, database error, or unhandled exception

All errors return JSON with a detail field:

{ "detail": "Error description here" }

Response Headers (TTS)

Header	Type	Description
`X-Processing-Time-Ms`	int	Server-side processing time in milliseconds
`X-Chars-Processed`	int	Number of characters in the input text
`X-Audio-Bytes`	int	Size of the returned audio in bytes
`X-Audio-Duration-Ms`	int	Estimated audio duration in milliseconds (based on 48kbps bitrate)
`X-Cache-Hit`	bool	`true` if served from Redis cache, `false` if freshly synthesized

Database Schema

api_keys

Column	Type	Description
id	UUID	Primary key
name	VARCHAR(100)	Human-readable key name
key_hash	VARCHAR(64)	SHA-256 hash of full key (indexed, unique)
key_prefix	VARCHAR(8)	Display prefix (e.g., vai_fb0e)
is_admin	BOOLEAN	Admin access flag
is_active	BOOLEAN	Revocation flag (soft delete)
rate_limit	INTEGER	Requests per minute
monthly_char_limit	BIGINT	0 = unlimited
monthly_chars_used	BIGINT	Current month character usage
quota_reset_at	TIMESTAMP	First of next month (auto-resets)
description	TEXT	Optional description
allowed_voices	TEXT	Comma-separated voice filter (null = all)
allowed_ips	TEXT	Comma-separated CIDR filter
total_requests	BIGINT	Lifetime request counter
total_chars	BIGINT	Lifetime character counter
total_audio_bytes	BIGINT	Lifetime audio size counter
created_at	TIMESTAMP	Key creation time
last_used_at	TIMESTAMP	Last API call time
expires_at	TIMESTAMP	Optional expiry date

usage_logs

Column	Type	Description
id	BIGSERIAL	Primary key
api_key_id	VARCHAR(36)	Foreign key to api_keys
api_key_name	VARCHAR(100)	Denormalized for fast queries
endpoint	VARCHAR(100)	Request path
method	VARCHAR(10)	HTTP method
client_ip	VARCHAR(45)	Client IP (from X-Forwarded-For)
user_agent	VARCHAR(500)	User-Agent header (truncated)
language	VARCHAR(10)	Language code (ta-IN, en-US, etc.)
voice	VARCHAR(50)	Voice ID used
chars_processed	INTEGER	Input text character count
text_hash	VARCHAR(16)	First 16 chars of SHA-256 (dedup analysis)
audio_bytes	INTEGER	Output audio size
audio_duration_ms	INTEGER	Estimated audio duration
response_time_ms	INTEGER	Server processing time
status_code	INTEGER	HTTP status returned
cache_hit	BOOLEAN	Whether served from cache
error_detail	VARCHAR(500)	Error message (if failed)
rate	VARCHAR(10)	Speed adjustment used
pitch	VARCHAR(10)	Pitch adjustment used
created_at	TIMESTAMP	Request timestamp

daily_usage

Column	Type	Description
id	BIGSERIAL	Primary key
api_key_id	VARCHAR(36)	Foreign key to api_keys
date	VARCHAR(10)	YYYY-MM-DD
requests	INTEGER	Request count
chars_processed	BIGINT	Total characters
audio_bytes	BIGINT	Total audio generated
cache_hits	INTEGER	Cache hit count
errors	INTEGER	Error count
avg_response_ms	INTEGER	Running average response time
language_breakdown	TEXT	JSON: {"ta-IN": 5, "en-US": 3}
voice_breakdown	TEXT	JSON: {"ta-IN-PallaviNeural": 5}

Configuration

All environment variables use the VOICEAI_ prefix and are set via Kubernetes ConfigMap/Secret.

Variable	Default	Description
VOICEAI_APP_NAME	Voice AI Platform	Application display name
VOICEAI_APP_VERSION	1.0.0	Version string
VOICEAI_DEBUG	false	Enable debug mode (exposes Swagger at /docs)
VOICEAI_DATABASE_URL	postgresql+asyncpg://...	PostgreSQL async connection string
VOICEAI_REDIS_URL	redis://...6379/2	Redis connection string (DB index 2)
VOICEAI_ADMIN_API_KEY	""	Bootstrap admin key (auto-inserted on startup)
VOICEAI_RATE_LIMIT_REQUESTS	60	Default requests per minute for new keys
VOICEAI_RATE_LIMIT_WINDOW	60	Rate limit window in seconds
VOICEAI_TTS_MAX_CHARS	5000	Maximum characters per TTS request
VOICEAI_TTS_CACHE_TTL	3600	Redis cache TTL in seconds (1 hour)
VOICEAI_CORS_ORIGINS	*	Comma-separated allowed CORS origins

Security

Transport

All traffic is encrypted via TLS 1.2+. HTTP requests are 301-redirected to HTTPS. Certificates are auto-provisioned and renewed by cert-manager + Let's Encrypt.

Data Protection

Data	Storage	Protection
API keys (raw)	Never stored	SHA-256 hashed before persistence
Input text	Not stored	Only first 16 chars of SHA-256 hash for dedup
Audio output	Redis (1h TTL)	Ephemeral cache, auto-evicted
Client IPs	usage_logs table	Logged for audit (consider retention policy)
DB/Redis passwords	K8s Secret	Base64 encoded at rest, TLS in transit

Access Control

Role	Accessible Endpoints
Public	`GET /health`, `GET /api/v1/voices`
User (any valid key)	`POST /api/v1/tts`, `POST /api/v1/tts/stream`, `GET /api/v1/usage*`
Admin (is_admin=true)	All user endpoints + `GET/POST/DELETE /admin/api/*`

Voice AI Platform

Platform Overview

Architecture

Supported Voices & Languages

Authentication

Authentication Flow

Key Format

POST /api/v1/tts

Request Headers

Request Body

Example Request

Response: 200 OK

Response Headers

Errors

POST /api/v1/tts/stream

Example

GET /api/v1/voices

Response: 200 OK

GET /api/v1/usage

Query Parameters

Response: 200 OK

GET /api/v1/usage/logs

Query Parameters

Response: 200 OK

GET /api/v1/usage/quota

Response: 200 OK (quota key)

POST /admin/api/keys

Request Body

Example

Response: 200 OK

GET /admin/api/keys

Query Parameters

Response: 200 OK

DELETE /admin/api/keys/{id}

Response: 200 OK

Errors

GET /admin/api/keys/{id}/usage

Query Parameters

Response: 200 OK

GET /admin/api/stats

Query Parameters

Response: 200 OK

Rate Limits & Quotas

Per-Key Rate Limiting

Monthly Character Quotas

Ingress Rate Limiting

Caching

Error Codes

Response Headers (TTS)

Database Schema

api_keys

usage_logs

daily_usage

Configuration

Security

Transport

Data Protection

Access Control

Response: `200 OK`

Response: `200 OK`

Response: `200 OK`

Response: `200 OK`

Response: `200 OK` (quota key)

Response: `200 OK`

Response: `200 OK`

Response: `200 OK`

Response: `200 OK`

Response: `200 OK`