Voice AI Platform

Complete API & platform documentation for the multilingual Text-to-Speech REST API.

Platform Overview

Voice AI is a production-grade TTS API supporting 14 neural voices across 7 language variants (Tamil, Hindi, Telugu, Malayalam, English IN/US/UK). It delivers sub-second cached responses, per-key rate limiting, monthly character quotas, and full request-level audit logging.

14
Neural Voices
7
Language Variants
<3ms
Cached Response
0
GPU Required
48kbps
MP3 Output

Architecture

Client ──HTTPS──▶ Nginx Ingress (TLS + rate limit 50rps) │ ▼ FastAPI Pod (uvicorn, 2 async workers) │ │ │ ▼ ▼ ▼ API Key Rate Limit TTS Engine Lookup Check (edge-tts) (PostgreSQL) (Redis) │ ▼ Microsoft Edge Neural TTS (WSS) │ ┌────────────────────┘ ▼ Redis Cache ──▶ Return audio/mpeg (1h TTL, SHA-256 keyed) │ ▼ Usage Logging (PostgreSQL) ├─ usage_logs (per-request) ├─ daily_usage (aggregated) └─ api_keys (counters updated)

Supported Voices & Languages

LanguageCodeFemale VoiceMale VoiceSample Text
Tamilta-INta-IN-PallaviNeuralta-IN-ValluvarNeuralவணக்கம், இது ஒரு குரல் சோதனை.
Hindihi-INhi-IN-SwaraNeuralhi-IN-MadhurNeuralनमस्ते, यह एक आवाज़ परीक्षण है।
Telugute-INte-IN-ShrutiNeuralte-IN-MohanNeuralనమస్కారం, ఇది ఒక వాయిస్ టెస్ట్.
Malayalamml-INml-IN-SobhanaNeuralml-IN-MidhunNeuralനമസ്കാരം, ഇതൊരു ശബ്ദ പരിശോധനയാണ്.
English (IN)en-INen-IN-NeerjaNeuralen-IN-PrabhatNeuralHello, this is a voice test.
English (US)en-USen-US-JennyNeuralen-US-GuyNeuralHello, this is a voice test.
English (UK)en-GBen-GB-SoniaNeuralen-GB-RyanNeuralHello, this is a voice test.

Authentication

All API endpoints (except /health and /api/v1/voices) require an API key passed via the X-API-Key header.

curl -H "X-API-Key: vai_your_key_here" https://voice.swaa.life/api/v1/usage/quota

Authentication Flow

X-API-Key header
SHA-256 hash
DB lookup by hash
Check active & expiry
Proceed

For admin endpoints, an additional check ensures is_admin = true. Non-admin keys receive 403 Forbidden.

Key Format

Keys follow the format vai_<32 hex characters> (36 characters total). Example:

vai_fb0ee9a84bb95cc5673dc0adaaff6ac2 │ └────────── 32 hex chars (secrets.token_hex(16)) ──────────┘ └── prefix
Note: The raw key is only shown at creation time. Only the SHA-256 hash is stored server-side. Lost keys cannot be recovered — create a new one and revoke the old one.

POST /api/v1/tts

POST /api/v1/tts Synthesize text to speech

Returns audio/mpeg binary (MP3, 48kbps mono). Identical requests are served from Redis cache (~1-3ms) for 1 hour.

Request Headers

HeaderRequiredDescription
X-API-KeyYesYour API key
Content-TypeYesapplication/json

Request Body

FieldTypeRequiredDefaultDescription
textstringYesText to synthesize (1–5000 chars)
voicestringYesVoice ID (e.g., ta-IN-PallaviNeural)
ratestringNo+0%Speed: -50% to +50%
pitchstringNo+0HzPitch: -20Hz to +20Hz

Example Request

curl -X POST https://voice.swaa.life/api/v1/tts \ -H "X-API-Key: vai_your_key" \ -H "Content-Type: application/json" \ -d '{"text": "வணக்கம் உலகம்", "voice": "ta-IN-PallaviNeural"}' \ --output speech.mp3

Response: 200 OK

Content-Type: audio/mpeg — MP3 audio binary

Response Headers

HeaderExampleDescription
X-Processing-Time-Ms1234Server processing time (ms)
X-Chars-Processed13Input text character count
X-Audio-Bytes8640Audio file size (bytes)
X-Audio-Duration-Ms1440Estimated audio duration (ms)
X-Cache-Hitfalsetrue if served from Redis

Errors

CodeCondition
400Invalid voice ID, text too long, or malformed JSON
401Missing, invalid, expired, or revoked API key
429Rate limit exceeded or monthly quota exceeded
500TTS engine failure

POST /api/v1/tts/stream

POST /api/v1/tts/stream Streaming TTS for real-time playback

Same request body as /api/v1/tts. Returns chunked audio/mpeg stream — audio chunks are sent as they are generated, allowing playback to begin before the full audio is ready. Ideal for long texts.

Example

curl -X POST https://voice.swaa.life/api/v1/tts/stream \ -H "X-API-Key: vai_your_key" \ -H "Content-Type: application/json" \ -d '{"text": "Long text here...", "voice": "en-US-JennyNeural"}' \ --output stream.mp3
Note: Streaming responses bypass the Redis cache. Each streaming request hits the Microsoft TTS API.

GET /api/v1/voices

GET /api/v1/voices List all available voices (no auth required)

Response: 200 OK

{ "voices": [ { "id": "ta-IN-PallaviNeural", "name": "Pallavi", "language": "Tamil", "language_code": "ta-IN", "gender": "Female", "sample_text": "வணக்கம், இது ஒரு குரல் சோதனை." }, // ... 14 voices total ], "total": 14, "languages": ["English (India)", "English (UK)", "English (US)", "Hindi", "Malayalam", "Tamil", "Telugu"] }

GET /api/v1/usage

GET /api/v1/usage Comprehensive usage stats for your key

Query Parameters

ParamTypeDefaultDescription
daysint30Number of days to aggregate

Response: 200 OK

{ "total_requests": 142, "total_chars": 28500, "total_audio_bytes": 4521984, "total_audio_duration_ms": 753664, "cache_hit_rate": 0.35, "avg_response_ms": 1250, "period_start": "2026-01-08T00:00:00", "period_end": "2026-02-07T18:15:00", "by_language": { "ta-IN": 50, "en-US": 92 }, "by_voice": { "ta-IN-PallaviNeural": 50, "en-US-JennyNeural": 92 }, "by_status": { "200": 140, "429": 2 }, "daily": [ { "date": "2026-02-01", "requests": 23, "chars": 4500, "audio_bytes": 720000, "cache_hits": 8, "errors": 0, "avg_response_ms": 1100 } ] }

GET /api/v1/usage/logs

GET /api/v1/usage/logs Individual request logs for your key

Query Parameters

ParamTypeDefaultDescription
limitint50Results per page (max 200)
offsetint0Pagination offset

Response: 200 OK

[{ "id": 1, "endpoint": "/api/v1/tts", "method": "POST", "voice": "ta-IN-PallaviNeural", "language": "ta-IN", "chars_processed": 42, "audio_bytes": 8640, "response_time_ms": 1234, "status_code": 200, "cache_hit": false, "client_ip": "72.61.243.39", "created_at": "2026-02-07T18:15:00" }]

GET /api/v1/usage/quota

GET /api/v1/usage/quota Current quota & lifetime counters

Response: 200 OK (quota key)

{ "monthly_char_limit": 1000000, // 0 = unlimited "monthly_chars_used": 245000, "monthly_chars_remaining": 755000, // null if unlimited "unlimited": false, "quota_resets_at": "2026-03-01T00:00:00", "rate_limit": 60, "total_requests": 1420, // lifetime "total_chars": 285000, // lifetime "total_audio_bytes": 45219840 // lifetime }

POST /admin/api/keys

Admin required. All admin endpoints require an API key with is_admin = true. Non-admin keys receive 403.
POST /admin/api/keys Create a new API key

Request Body

FieldTypeRequiredDefaultDescription
namestringYesHuman-readable name (1–100 chars)
descriptionstringNo""Optional description (max 500)
rate_limitintNo60Requests per minute (1–1000)
monthly_char_limitintNo0Monthly character quota (0 = unlimited)
is_adminboolNofalseGrant admin privileges
allowed_voicesstringNonullComma-separated voice IDs (null = all)

Example

curl -X POST https://voice.swaa.life/admin/api/keys \ -H "X-API-Key: vai_admin_key" \ -H "Content-Type: application/json" \ -d '{ "name": "Mobile App", "description": "Production key for iOS app", "rate_limit": 100, "monthly_char_limit": 2000000 }'

Response: 200 OK

{ "id": "a1b2c3d4-e5f6-...", "name": "Mobile App", "description": "Production key for iOS app", "key_prefix": "vai_3f8a", "is_admin": false, "is_active": true, "rate_limit": 100, "monthly_char_limit": 2000000, "monthly_chars_used": 0, "total_requests": 0, "total_chars": 0, "total_audio_bytes": 0, "created_at": "2026-02-07T18:00:00", "api_key": "vai_3f8a1b2c3d4e5f6a7b8c9d0e1f2a3b4c" // SHOWN ONLY ONCE }
Important: The api_key field is only returned at creation time. Store it securely — it cannot be retrieved again.

GET /admin/api/keys

GET /admin/api/keys List all API keys with usage counters

Query Parameters

ParamTypeDefaultDescription
include_inactiveboolfalseInclude revoked keys

Response: 200 OK

[{ "id": "619a4341-...", "name": "Bootstrap Admin", "description": null, "key_prefix": "vai_fb0e", "is_admin": true, "is_active": true, "rate_limit": 1000, "monthly_char_limit": 0, "monthly_chars_used": 0, "total_requests": 142, "total_chars": 28500, "total_audio_bytes": 4521984, "created_at": "2026-02-07T10:00:00", "last_used_at": "2026-02-07T18:15:00", "expires_at": null }]

DELETE /admin/api/keys/{id}

DELETE /admin/api/keys/{id} Revoke an API key (soft delete)

Sets is_active = false. The key record remains for audit purposes. Authentication fails immediately.

Response: 200 OK

{ "detail": "API key revoked." }

Errors

CodeCondition
404Key ID not found

GET /admin/api/keys/{id}/usage

GET /admin/api/keys/{id}/usage Detailed usage for a specific key

Query Parameters

ParamTypeDefaultDescription
daysint30Number of days to query

Response: 200 OK

{ "key": { "id": "a1b2c3d4-...", "name": "Mobile App", "total_requests": 500, "total_chars": 100000, "total_audio_bytes": 16000000, "monthly_char_limit": 500000, "monthly_chars_used": 45000 }, "recent_logs": [{ "id": 100, "endpoint": "/api/v1/tts", "voice": "ta-IN-PallaviNeural", "chars": 42, "audio_bytes": 8640, "response_ms": 1234, "status": 200, "cache_hit": false, "client_ip": "72.61.243.39", "user_agent": "Mozilla/5.0...", "created_at": "2026-02-07T18:15:00" }], "daily": [{ "date": "2026-02-07", "requests": 23, "chars": 4500, "audio_bytes": 720000, "cache_hits": 8, "errors": 0, "avg_response_ms": 1100 }] }

GET /admin/api/stats

GET /admin/api/stats Platform-wide statistics

Query Parameters

ParamTypeDefaultDescription
daysint30Number of days to aggregate

Response: 200 OK

{ "total_keys": 5, "active_keys": 4, "total_requests": 1420, "total_chars": 285000, "total_audio_bytes": 45219840, "avg_response_ms": 1250, "cache_hit_rate": 0.35, "top_voices": [{ "voice": "ta-IN-PallaviNeural", "requests": 500 }], "top_languages": [{ "language": "ta-IN", "requests": 600 }], "top_keys": [{ "key_id": "a1b2...", "key_name": "Mobile App", "requests": 500, "chars": 100000 }], "daily_trend": [{ "date": "2026-02-01", "requests": 100, "chars": 20000, "errors": 0, "cache_hits": 35 }], "requests_today": 45, "chars_today": 9000, "errors_today": 0 }

Rate Limits & Quotas

Per-Key Rate Limiting

Each API key has a configurable rate_limit (requests per minute, default 60). Uses a Redis sorted set sliding window algorithm. When exceeded:

HTTP 429 Too Many Requests Retry-After: 60 { "detail": "Rate limit exceeded. 60 requests per 60s allowed." }

Monthly Character Quotas

Optional per-key monthly character limit. When monthly_char_limit > 0, each request's text length is counted. Quotas reset automatically on the 1st of each month (UTC). When exceeded:

HTTP 429 Too Many Requests { "detail": "Monthly character quota exceeded.", "quota": 1000000, "used": 1000000, "remaining": 0, "resets_at": "2026-03-01T00:00:00" }

Ingress Rate Limiting

Nginx Ingress applies a separate per-IP rate limit: 50 requests/second with 5x burst (250). This is independent of the per-key API rate limit.

Caching

TTS responses are cached in Redis for 1 hour (configurable). The cache key is:

SHA-256("text|voice|rate|pitch") → tts:cache:<hash>

Identical requests with the same text, voice, rate, and pitch combination return the cached audio in ~1-3ms instead of ~1500ms from the Microsoft TTS API. Cache hits are tracked in usage logs (cache_hit: true).

Streaming bypass: /api/v1/tts/stream does not use the cache. Only /api/v1/tts reads and writes the cache.

Error Codes

CodeMeaningDetail
400Bad RequestInvalid voice ID, text exceeds 5000 chars, malformed JSON body
401UnauthorizedMissing X-API-Key header, invalid key, expired key, or revoked key
403ForbiddenNon-admin key used on admin endpoint
404Not FoundInvalid key ID in admin endpoint
429Too Many RequestsRate limit exceeded (includes Retry-After header) or monthly quota exceeded (includes quota details)
500Server ErrorTTS engine failure, database error, or unhandled exception

All errors return JSON with a detail field:

{ "detail": "Error description here" }

Response Headers (TTS)

HeaderTypeDescription
X-Processing-Time-MsintServer-side processing time in milliseconds
X-Chars-ProcessedintNumber of characters in the input text
X-Audio-BytesintSize of the returned audio in bytes
X-Audio-Duration-MsintEstimated audio duration in milliseconds (based on 48kbps bitrate)
X-Cache-Hitbooltrue if served from Redis cache, false if freshly synthesized

Database Schema

api_keys

ColumnTypeDescription
idUUIDPrimary key
nameVARCHAR(100)Human-readable key name
key_hashVARCHAR(64)SHA-256 hash of full key (indexed, unique)
key_prefixVARCHAR(8)Display prefix (e.g., vai_fb0e)
is_adminBOOLEANAdmin access flag
is_activeBOOLEANRevocation flag (soft delete)
rate_limitINTEGERRequests per minute
monthly_char_limitBIGINT0 = unlimited
monthly_chars_usedBIGINTCurrent month character usage
quota_reset_atTIMESTAMPFirst of next month (auto-resets)
descriptionTEXTOptional description
allowed_voicesTEXTComma-separated voice filter (null = all)
allowed_ipsTEXTComma-separated CIDR filter
total_requestsBIGINTLifetime request counter
total_charsBIGINTLifetime character counter
total_audio_bytesBIGINTLifetime audio size counter
created_atTIMESTAMPKey creation time
last_used_atTIMESTAMPLast API call time
expires_atTIMESTAMPOptional expiry date

usage_logs

ColumnTypeDescription
idBIGSERIALPrimary key
api_key_idVARCHAR(36)Foreign key to api_keys
api_key_nameVARCHAR(100)Denormalized for fast queries
endpointVARCHAR(100)Request path
methodVARCHAR(10)HTTP method
client_ipVARCHAR(45)Client IP (from X-Forwarded-For)
user_agentVARCHAR(500)User-Agent header (truncated)
languageVARCHAR(10)Language code (ta-IN, en-US, etc.)
voiceVARCHAR(50)Voice ID used
chars_processedINTEGERInput text character count
text_hashVARCHAR(16)First 16 chars of SHA-256 (dedup analysis)
audio_bytesINTEGEROutput audio size
audio_duration_msINTEGEREstimated audio duration
response_time_msINTEGERServer processing time
status_codeINTEGERHTTP status returned
cache_hitBOOLEANWhether served from cache
error_detailVARCHAR(500)Error message (if failed)
rateVARCHAR(10)Speed adjustment used
pitchVARCHAR(10)Pitch adjustment used
created_atTIMESTAMPRequest timestamp

daily_usage

ColumnTypeDescription
idBIGSERIALPrimary key
api_key_idVARCHAR(36)Foreign key to api_keys
dateVARCHAR(10)YYYY-MM-DD
requestsINTEGERRequest count
chars_processedBIGINTTotal characters
audio_bytesBIGINTTotal audio generated
cache_hitsINTEGERCache hit count
errorsINTEGERError count
avg_response_msINTEGERRunning average response time
language_breakdownTEXTJSON: {"ta-IN": 5, "en-US": 3}
voice_breakdownTEXTJSON: {"ta-IN-PallaviNeural": 5}

Configuration

All environment variables use the VOICEAI_ prefix and are set via Kubernetes ConfigMap/Secret.

VariableDefaultDescription
VOICEAI_APP_NAMEVoice AI PlatformApplication display name
VOICEAI_APP_VERSION1.0.0Version string
VOICEAI_DEBUGfalseEnable debug mode (exposes Swagger at /docs)
VOICEAI_DATABASE_URLpostgresql+asyncpg://...PostgreSQL async connection string
VOICEAI_REDIS_URLredis://...6379/2Redis connection string (DB index 2)
VOICEAI_ADMIN_API_KEY""Bootstrap admin key (auto-inserted on startup)
VOICEAI_RATE_LIMIT_REQUESTS60Default requests per minute for new keys
VOICEAI_RATE_LIMIT_WINDOW60Rate limit window in seconds
VOICEAI_TTS_MAX_CHARS5000Maximum characters per TTS request
VOICEAI_TTS_CACHE_TTL3600Redis cache TTL in seconds (1 hour)
VOICEAI_CORS_ORIGINS*Comma-separated allowed CORS origins

Security

Transport

All traffic is encrypted via TLS 1.2+. HTTP requests are 301-redirected to HTTPS. Certificates are auto-provisioned and renewed by cert-manager + Let's Encrypt.

Data Protection

DataStorageProtection
API keys (raw)Never storedSHA-256 hashed before persistence
Input textNot storedOnly first 16 chars of SHA-256 hash for dedup
Audio outputRedis (1h TTL)Ephemeral cache, auto-evicted
Client IPsusage_logs tableLogged for audit (consider retention policy)
DB/Redis passwordsK8s SecretBase64 encoded at rest, TLS in transit

Access Control

RoleAccessible Endpoints
PublicGET /health, GET /api/v1/voices
User (any valid key)POST /api/v1/tts, POST /api/v1/tts/stream, GET /api/v1/usage*
Admin (is_admin=true)All user endpoints + GET/POST/DELETE /admin/api/*