Voice AI Platform
Complete API & platform documentation for the multilingual Text-to-Speech REST API.
Platform Overview
Voice AI is a production-grade TTS API supporting 14 neural voices across 7 language variants (Tamil, Hindi, Telugu, Malayalam, English IN/US/UK). It delivers sub-second cached responses, per-key rate limiting, monthly character quotas, and full request-level audit logging.
Architecture
Client ──HTTPS──▶ Nginx Ingress (TLS + rate limit 50rps)
│
▼
FastAPI Pod (uvicorn, 2 async workers)
│ │ │
▼ ▼ ▼
API Key Rate Limit TTS Engine
Lookup Check (edge-tts)
(PostgreSQL) (Redis) │
▼
Microsoft Edge
Neural TTS (WSS)
│
┌────────────────────┘
▼
Redis Cache ──▶ Return audio/mpeg
(1h TTL, SHA-256 keyed)
│
▼
Usage Logging (PostgreSQL)
├─ usage_logs (per-request)
├─ daily_usage (aggregated)
└─ api_keys (counters updated)
Supported Voices & Languages
| Language | Code | Female Voice | Male Voice | Sample Text |
| Tamil | ta-IN | ta-IN-PallaviNeural | ta-IN-ValluvarNeural | வணக்கம், இது ஒரு குரல் சோதனை. |
| Hindi | hi-IN | hi-IN-SwaraNeural | hi-IN-MadhurNeural | नमस्ते, यह एक आवाज़ परीक्षण है। |
| Telugu | te-IN | te-IN-ShrutiNeural | te-IN-MohanNeural | నమస్కారం, ఇది ఒక వాయిస్ టెస్ట్. |
| Malayalam | ml-IN | ml-IN-SobhanaNeural | ml-IN-MidhunNeural | നമസ്കാരം, ഇതൊരു ശബ്ദ പരിശോധനയാണ്. |
| English (IN) | en-IN | en-IN-NeerjaNeural | en-IN-PrabhatNeural | Hello, this is a voice test. |
| English (US) | en-US | en-US-JennyNeural | en-US-GuyNeural | Hello, this is a voice test. |
| English (UK) | en-GB | en-GB-SoniaNeural | en-GB-RyanNeural | Hello, this is a voice test. |
Authentication
All API endpoints (except /health and /api/v1/voices) require an API key passed via the X-API-Key header.
curl -H "X-API-Key: vai_your_key_here" https://voice.swaa.life/api/v1/usage/quota
Authentication Flow
X-API-Key header
▶
SHA-256 hash
▶
DB lookup by hash
▶
Check active & expiry
▶
Proceed
For admin endpoints, an additional check ensures is_admin = true. Non-admin keys receive 403 Forbidden.
Keys follow the format vai_<32 hex characters> (36 characters total). Example:
vai_fb0ee9a84bb95cc5673dc0adaaff6ac2
│ └────────── 32 hex chars (secrets.token_hex(16)) ──────────┘
└── prefix
Note: The raw key is only shown at creation time. Only the SHA-256 hash is stored server-side. Lost keys cannot be recovered — create a new one and revoke the old one.
POST /api/v1/tts
Returns audio/mpeg binary (MP3, 48kbps mono). Identical requests are served from Redis cache (~1-3ms) for 1 hour.
Request Headers
| Header | Required | Description |
| X-API-Key | Yes | Your API key |
| Content-Type | Yes | application/json |
Request Body
| Field | Type | Required | Default | Description |
| text | string | Yes | — | Text to synthesize (1–5000 chars) |
| voice | string | Yes | — | Voice ID (e.g., ta-IN-PallaviNeural) |
| rate | string | No | +0% | Speed: -50% to +50% |
| pitch | string | No | +0Hz | Pitch: -20Hz to +20Hz |
Example Request
curl -X POST https://voice.swaa.life/api/v1/tts \
-H "X-API-Key: vai_your_key" \
-H "Content-Type: application/json" \
-d '{"text": "வணக்கம் உலகம்", "voice": "ta-IN-PallaviNeural"}' \
--output speech.mp3
Response: 200 OK
Content-Type: audio/mpeg — MP3 audio binary
Response Headers
| Header | Example | Description |
| X-Processing-Time-Ms | 1234 | Server processing time (ms) |
| X-Chars-Processed | 13 | Input text character count |
| X-Audio-Bytes | 8640 | Audio file size (bytes) |
| X-Audio-Duration-Ms | 1440 | Estimated audio duration (ms) |
| X-Cache-Hit | false | true if served from Redis |
Errors
| Code | Condition |
| 400 | Invalid voice ID, text too long, or malformed JSON |
| 401 | Missing, invalid, expired, or revoked API key |
| 429 | Rate limit exceeded or monthly quota exceeded |
| 500 | TTS engine failure |
POST /api/v1/tts/stream
Same request body as /api/v1/tts. Returns chunked audio/mpeg stream — audio chunks are sent as they are generated, allowing playback to begin before the full audio is ready. Ideal for long texts.
Example
curl -X POST https://voice.swaa.life/api/v1/tts/stream \
-H "X-API-Key: vai_your_key" \
-H "Content-Type: application/json" \
-d '{"text": "Long text here...", "voice": "en-US-JennyNeural"}' \
--output stream.mp3
Note: Streaming responses bypass the Redis cache. Each streaming request hits the Microsoft TTS API.
GET /api/v1/voices
Response: 200 OK
{
"voices": [
{
"id": "ta-IN-PallaviNeural",
"name": "Pallavi",
"language": "Tamil",
"language_code": "ta-IN",
"gender": "Female",
"sample_text": "வணக்கம், இது ஒரு குரல் சோதனை."
}, // ... 14 voices total
],
"total": 14,
"languages": ["English (India)", "English (UK)", "English (US)", "Hindi", "Malayalam", "Tamil", "Telugu"]
}
GET /api/v1/usage
Query Parameters
| Param | Type | Default | Description |
| days | int | 30 | Number of days to aggregate |
Response: 200 OK
{
"total_requests": 142,
"total_chars": 28500,
"total_audio_bytes": 4521984,
"total_audio_duration_ms": 753664,
"cache_hit_rate": 0.35,
"avg_response_ms": 1250,
"period_start": "2026-01-08T00:00:00",
"period_end": "2026-02-07T18:15:00",
"by_language": { "ta-IN": 50, "en-US": 92 },
"by_voice": { "ta-IN-PallaviNeural": 50, "en-US-JennyNeural": 92 },
"by_status": { "200": 140, "429": 2 },
"daily": [
{
"date": "2026-02-01", "requests": 23, "chars": 4500,
"audio_bytes": 720000, "cache_hits": 8, "errors": 0,
"avg_response_ms": 1100
}
]
}
GET /api/v1/usage/logs
Query Parameters
| Param | Type | Default | Description |
| limit | int | 50 | Results per page (max 200) |
| offset | int | 0 | Pagination offset |
Response: 200 OK
[{
"id": 1,
"endpoint": "/api/v1/tts",
"method": "POST",
"voice": "ta-IN-PallaviNeural",
"language": "ta-IN",
"chars_processed": 42,
"audio_bytes": 8640,
"response_time_ms": 1234,
"status_code": 200,
"cache_hit": false,
"client_ip": "72.61.243.39",
"created_at": "2026-02-07T18:15:00"
}]
GET /api/v1/usage/quota
Response: 200 OK (quota key)
{
"monthly_char_limit": 1000000, // 0 = unlimited
"monthly_chars_used": 245000,
"monthly_chars_remaining": 755000, // null if unlimited
"unlimited": false,
"quota_resets_at": "2026-03-01T00:00:00",
"rate_limit": 60,
"total_requests": 1420, // lifetime
"total_chars": 285000, // lifetime
"total_audio_bytes": 45219840 // lifetime
}
POST /admin/api/keys
Admin required. All admin endpoints require an API key with is_admin = true. Non-admin keys receive 403.
Request Body
| Field | Type | Required | Default | Description |
| name | string | Yes | — | Human-readable name (1–100 chars) |
| description | string | No | "" | Optional description (max 500) |
| rate_limit | int | No | 60 | Requests per minute (1–1000) |
| monthly_char_limit | int | No | 0 | Monthly character quota (0 = unlimited) |
| is_admin | bool | No | false | Grant admin privileges |
| allowed_voices | string | No | null | Comma-separated voice IDs (null = all) |
Example
curl -X POST https://voice.swaa.life/admin/api/keys \
-H "X-API-Key: vai_admin_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Mobile App",
"description": "Production key for iOS app",
"rate_limit": 100,
"monthly_char_limit": 2000000
}'
Response: 200 OK
{
"id": "a1b2c3d4-e5f6-...",
"name": "Mobile App",
"description": "Production key for iOS app",
"key_prefix": "vai_3f8a",
"is_admin": false,
"is_active": true,
"rate_limit": 100,
"monthly_char_limit": 2000000,
"monthly_chars_used": 0,
"total_requests": 0,
"total_chars": 0,
"total_audio_bytes": 0,
"created_at": "2026-02-07T18:00:00",
"api_key": "vai_3f8a1b2c3d4e5f6a7b8c9d0e1f2a3b4c" // SHOWN ONLY ONCE
}
Important: The api_key field is only returned at creation time. Store it securely — it cannot be retrieved again.
GET /admin/api/keys
Query Parameters
| Param | Type | Default | Description |
| include_inactive | bool | false | Include revoked keys |
Response: 200 OK
[{
"id": "619a4341-...",
"name": "Bootstrap Admin",
"description": null,
"key_prefix": "vai_fb0e",
"is_admin": true,
"is_active": true,
"rate_limit": 1000,
"monthly_char_limit": 0,
"monthly_chars_used": 0,
"total_requests": 142,
"total_chars": 28500,
"total_audio_bytes": 4521984,
"created_at": "2026-02-07T10:00:00",
"last_used_at": "2026-02-07T18:15:00",
"expires_at": null
}]
DELETE /admin/api/keys/{id}
Sets is_active = false. The key record remains for audit purposes. Authentication fails immediately.
Response: 200 OK
{ "detail": "API key revoked." }
Errors
| Code | Condition |
| 404 | Key ID not found |
GET /admin/api/keys/{id}/usage
Query Parameters
| Param | Type | Default | Description |
| days | int | 30 | Number of days to query |
Response: 200 OK
{
"key": {
"id": "a1b2c3d4-...", "name": "Mobile App",
"total_requests": 500, "total_chars": 100000,
"total_audio_bytes": 16000000,
"monthly_char_limit": 500000, "monthly_chars_used": 45000
},
"recent_logs": [{
"id": 100, "endpoint": "/api/v1/tts",
"voice": "ta-IN-PallaviNeural", "chars": 42,
"audio_bytes": 8640, "response_ms": 1234,
"status": 200, "cache_hit": false,
"client_ip": "72.61.243.39",
"user_agent": "Mozilla/5.0...",
"created_at": "2026-02-07T18:15:00"
}],
"daily": [{
"date": "2026-02-07", "requests": 23, "chars": 4500,
"audio_bytes": 720000, "cache_hits": 8,
"errors": 0, "avg_response_ms": 1100
}]
}
GET /admin/api/stats
Query Parameters
| Param | Type | Default | Description |
| days | int | 30 | Number of days to aggregate |
Response: 200 OK
{
"total_keys": 5, "active_keys": 4,
"total_requests": 1420, "total_chars": 285000,
"total_audio_bytes": 45219840,
"avg_response_ms": 1250, "cache_hit_rate": 0.35,
"top_voices": [{ "voice": "ta-IN-PallaviNeural", "requests": 500 }],
"top_languages": [{ "language": "ta-IN", "requests": 600 }],
"top_keys": [{ "key_id": "a1b2...", "key_name": "Mobile App", "requests": 500, "chars": 100000 }],
"daily_trend": [{ "date": "2026-02-01", "requests": 100, "chars": 20000, "errors": 0, "cache_hits": 35 }],
"requests_today": 45, "chars_today": 9000, "errors_today": 0
}
Rate Limits & Quotas
Per-Key Rate Limiting
Each API key has a configurable rate_limit (requests per minute, default 60). Uses a Redis sorted set sliding window algorithm. When exceeded:
HTTP 429 Too Many Requests
Retry-After: 60
{ "detail": "Rate limit exceeded. 60 requests per 60s allowed." }
Monthly Character Quotas
Optional per-key monthly character limit. When monthly_char_limit > 0, each request's text length is counted. Quotas reset automatically on the 1st of each month (UTC). When exceeded:
HTTP 429 Too Many Requests
{
"detail": "Monthly character quota exceeded.",
"quota": 1000000,
"used": 1000000,
"remaining": 0,
"resets_at": "2026-03-01T00:00:00"
}
Ingress Rate Limiting
Nginx Ingress applies a separate per-IP rate limit: 50 requests/second with 5x burst (250). This is independent of the per-key API rate limit.
Caching
TTS responses are cached in Redis for 1 hour (configurable). The cache key is:
SHA-256("text|voice|rate|pitch") → tts:cache:<hash>
Identical requests with the same text, voice, rate, and pitch combination return the cached audio in ~1-3ms instead of ~1500ms from the Microsoft TTS API. Cache hits are tracked in usage logs (cache_hit: true).
Streaming bypass: /api/v1/tts/stream does not use the cache. Only /api/v1/tts reads and writes the cache.
Error Codes
| Code | Meaning | Detail |
| 400 | Bad Request | Invalid voice ID, text exceeds 5000 chars, malformed JSON body |
| 401 | Unauthorized | Missing X-API-Key header, invalid key, expired key, or revoked key |
| 403 | Forbidden | Non-admin key used on admin endpoint |
| 404 | Not Found | Invalid key ID in admin endpoint |
| 429 | Too Many Requests | Rate limit exceeded (includes Retry-After header) or monthly quota exceeded (includes quota details) |
| 500 | Server Error | TTS engine failure, database error, or unhandled exception |
All errors return JSON with a detail field:
{ "detail": "Error description here" }
| Header | Type | Description |
X-Processing-Time-Ms | int | Server-side processing time in milliseconds |
X-Chars-Processed | int | Number of characters in the input text |
X-Audio-Bytes | int | Size of the returned audio in bytes |
X-Audio-Duration-Ms | int | Estimated audio duration in milliseconds (based on 48kbps bitrate) |
X-Cache-Hit | bool | true if served from Redis cache, false if freshly synthesized |
Database Schema
api_keys
| Column | Type | Description |
| id | UUID | Primary key |
| name | VARCHAR(100) | Human-readable key name |
| key_hash | VARCHAR(64) | SHA-256 hash of full key (indexed, unique) |
| key_prefix | VARCHAR(8) | Display prefix (e.g., vai_fb0e) |
| is_admin | BOOLEAN | Admin access flag |
| is_active | BOOLEAN | Revocation flag (soft delete) |
| rate_limit | INTEGER | Requests per minute |
| monthly_char_limit | BIGINT | 0 = unlimited |
| monthly_chars_used | BIGINT | Current month character usage |
| quota_reset_at | TIMESTAMP | First of next month (auto-resets) |
| description | TEXT | Optional description |
| allowed_voices | TEXT | Comma-separated voice filter (null = all) |
| allowed_ips | TEXT | Comma-separated CIDR filter |
| total_requests | BIGINT | Lifetime request counter |
| total_chars | BIGINT | Lifetime character counter |
| total_audio_bytes | BIGINT | Lifetime audio size counter |
| created_at | TIMESTAMP | Key creation time |
| last_used_at | TIMESTAMP | Last API call time |
| expires_at | TIMESTAMP | Optional expiry date |
usage_logs
| Column | Type | Description |
| id | BIGSERIAL | Primary key |
| api_key_id | VARCHAR(36) | Foreign key to api_keys |
| api_key_name | VARCHAR(100) | Denormalized for fast queries |
| endpoint | VARCHAR(100) | Request path |
| method | VARCHAR(10) | HTTP method |
| client_ip | VARCHAR(45) | Client IP (from X-Forwarded-For) |
| user_agent | VARCHAR(500) | User-Agent header (truncated) |
| language | VARCHAR(10) | Language code (ta-IN, en-US, etc.) |
| voice | VARCHAR(50) | Voice ID used |
| chars_processed | INTEGER | Input text character count |
| text_hash | VARCHAR(16) | First 16 chars of SHA-256 (dedup analysis) |
| audio_bytes | INTEGER | Output audio size |
| audio_duration_ms | INTEGER | Estimated audio duration |
| response_time_ms | INTEGER | Server processing time |
| status_code | INTEGER | HTTP status returned |
| cache_hit | BOOLEAN | Whether served from cache |
| error_detail | VARCHAR(500) | Error message (if failed) |
| rate | VARCHAR(10) | Speed adjustment used |
| pitch | VARCHAR(10) | Pitch adjustment used |
| created_at | TIMESTAMP | Request timestamp |
daily_usage
| Column | Type | Description |
| id | BIGSERIAL | Primary key |
| api_key_id | VARCHAR(36) | Foreign key to api_keys |
| date | VARCHAR(10) | YYYY-MM-DD |
| requests | INTEGER | Request count |
| chars_processed | BIGINT | Total characters |
| audio_bytes | BIGINT | Total audio generated |
| cache_hits | INTEGER | Cache hit count |
| errors | INTEGER | Error count |
| avg_response_ms | INTEGER | Running average response time |
| language_breakdown | TEXT | JSON: {"ta-IN": 5, "en-US": 3} |
| voice_breakdown | TEXT | JSON: {"ta-IN-PallaviNeural": 5} |
Configuration
All environment variables use the VOICEAI_ prefix and are set via Kubernetes ConfigMap/Secret.
| Variable | Default | Description |
| VOICEAI_APP_NAME | Voice AI Platform | Application display name |
| VOICEAI_APP_VERSION | 1.0.0 | Version string |
| VOICEAI_DEBUG | false | Enable debug mode (exposes Swagger at /docs) |
| VOICEAI_DATABASE_URL | postgresql+asyncpg://... | PostgreSQL async connection string |
| VOICEAI_REDIS_URL | redis://...6379/2 | Redis connection string (DB index 2) |
| VOICEAI_ADMIN_API_KEY | "" | Bootstrap admin key (auto-inserted on startup) |
| VOICEAI_RATE_LIMIT_REQUESTS | 60 | Default requests per minute for new keys |
| VOICEAI_RATE_LIMIT_WINDOW | 60 | Rate limit window in seconds |
| VOICEAI_TTS_MAX_CHARS | 5000 | Maximum characters per TTS request |
| VOICEAI_TTS_CACHE_TTL | 3600 | Redis cache TTL in seconds (1 hour) |
| VOICEAI_CORS_ORIGINS | * | Comma-separated allowed CORS origins |
Security
Transport
All traffic is encrypted via TLS 1.2+. HTTP requests are 301-redirected to HTTPS. Certificates are auto-provisioned and renewed by cert-manager + Let's Encrypt.
Data Protection
| Data | Storage | Protection |
| API keys (raw) | Never stored | SHA-256 hashed before persistence |
| Input text | Not stored | Only first 16 chars of SHA-256 hash for dedup |
| Audio output | Redis (1h TTL) | Ephemeral cache, auto-evicted |
| Client IPs | usage_logs table | Logged for audit (consider retention policy) |
| DB/Redis passwords | K8s Secret | Base64 encoded at rest, TLS in transit |
Access Control
| Role | Accessible Endpoints |
| Public | GET /health, GET /api/v1/voices |
| User (any valid key) | POST /api/v1/tts, POST /api/v1/tts/stream, GET /api/v1/usage* |
| Admin (is_admin=true) | All user endpoints + GET/POST/DELETE /admin/api/* |