Text-to-Speech Submission API
Text-to-speech submission system for converting user text input into AI-generated audio with shareable links.
BASE API ENDPOINT
/api/v1/text-to-speech
Epic 1 provides manual TTS processing and shareable audio links:
- User submits text + voice selection
- Admin manually processes request (obtains audio from external source)
- Admin uploads audio file to S3
- User receives shareable link after admin completes processing
- QR code generation deferred to future releases
User Flow: Submit Text → Manual Admin Processing → Admin Uploads Audio → Get Shareable Link
Text-to-Speech Submission Entity
Entity Schema
interface SubmissionEntity {
id: string; // UUID - Submission identifier
text: string; // User input text (max 1000 characters)
voiceId: string; // Selected voice ID from Voice Gallery
clientId: string; // Client identifier (auto-populated from JWT token)
slug: string | null; // Unique URL-safe identifier (NULL until status is 'completed')
audioUrl: string | null; // S3 URL of generated audio file (NULL until uploaded)
status: string; // Submission status: "pending", "processing", "completed"
characterCount: number; // Number of characters in text
processingTime: number; // Time taken to generate (in milliseconds)
createdAt: string; // ISO 8601 timestamp - Submission creation time
createdBy: {
userId: string; // User ID from Microsoft Identity
userName: string; // User display name
};
}
Field Descriptions:
- id: Unique identifier (UUID) for the submission
- text: User's input text for audio generation (max 1000 characters)
- voiceId: ID of the selected voice from Voice Gallery
- clientId: Client identifier (automatically populated from JWT token)
- slug: Unique URL-safe identifier for sharing the submission (NULL until status is
completed) - audioUrl: S3 URL of the generated audio file (NULL until uploaded)
- status: Current submission status (
"pending","processing","completed") - characterCount: Number of characters in the input text
- processingTime: Time taken to generate audio (in milliseconds)
- createdAt: Timestamp when the submission was created (ISO 8601 format)
- createdBy: Information about the user who created the submission
- userId: User ID from Microsoft Identity
- userName: User's display name
Notes:
- Audio files are manually generated by admin and uploaded to S3 (Epic 1)
- Slug is auto-generated when status changes to
completed- generated from text content (URL-safe, unique) with timestamp/UUID suffix - Once generated, the slug never changes - provides stable shareable links
- Playback URL format:
https://micdots.com/play/{{slug}} - S3 filenames are based on the slug:
{{slug}}.mp3(same for both normal and dummy modes) clientIdandcreatedByare automatically populated from the authenticated user's JWT token (NOT sent in request)- Processing time: Varies based on manual admin processing
Entity Example
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882400123",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882400123.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
Notes:
- The slug includes a timestamp suffix (1706882400123) to ensure uniqueness across all submissions
- Once generated, the slug never changes
Form Fields Reference
The following fields are used in the submission form:
| Field Name | Type | Required | Description |
|---|---|---|---|
text | string | Yes | User's text input (1-1000 characters) |
voiceId | string | Yes | Selected voice model ID from Voice Gallery |
Auto-populated fields (NOT sent in request):
clientId- Automatically extracted from the authenticated user's JWT tokencreatedBy- Automatically populated with user ID and name from JWT token
API Endpoint
Submit Text-to-Speech Request
This endpoint is available in MVP 1 for authenticated users to generate audio QR codes.
Endpoint: POST /api/v1/text-to-speech
Headers:
Authorization: Bearer {{access-token}}
Content-Type: application/json
Request Body (JSON):
{
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123"
}
Form Fields: See Form Fields Reference above for complete field documentation.
Processing Flow (Epic 1 - Manual):
-
Validate text length and voice ID
-
Automatically populate
clientIdandcreatedByfrom the authenticated user's JWT token (NOT sent in request) -
Create submission record with status
pending(slug is NULL initially) -
Return submission entity (without audio URL or slug initially)
Note: At this point, the user receives a submission confirmation and must wait while the admin processes the request.
-
Admin processes manually:
- Admin generates audio file offline from external source
- Admin requests pre-signed URL:
POST /api/v1/text-to-speech/upload-url - Admin uploads MP3 file directly to S3
- Admin updates submission:
PUT /api/v1/text-to-speech/{{id}}withaudioUrland statuscompleted
-
System automatically generates unique slug when status changes to
completed:- Generated from text content with timestamp/UUID suffix (e.g.,
welcome-audio-123456789) - Slug is permanent and never changes after generation
- Playback URL created:
https://micdots.com/play/{{slug}}
- Generated from text content with timestamp/UUID suffix (e.g.,
Important: The slug is generated automatically when the submission status is updated to completed. This ensures every completed submission has a shareable playback link.
In future releases, the processing flow will be automated:
- Automated text-to-speech generation (via TTS API)
- Audio automatically uploaded to S3
- Immediate completion (no manual admin processing)
Note: clientId and createdBy are NOT sent in the request. The API automatically extracts the client ID, user ID, and user name from the authenticated user's JWT token and populates these fields.
Response: 201 Created
{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
},
"links": {
"share": "https://micdots.com/play/welcome-audio-service-1706882300456",
"audio": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3"
}
}
Error Response: 400 Bad Request (Validation Error)
{
"success": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "Validation failed",
"fields": {
"text": ["Text must be between 1 and 1000 characters"],
"voiceId": ["Voice ID is required"]
}
}
}
Error Response: 404 Not Found (Invalid Voice ID)
{
"success": false,
"error": {
"code": "VOICE_NOT_FOUND",
"message": "Voice with ID 'invalid-voice-id' not found or not published."
}
}
Error Response: 500 Internal Server Error (Processing Failed)
{
"success": false,
"error": {
"code": "PROCESSING_FAILED",
"message": "Failed to process request. Please try again."
}
}
Get Submission by ID
This public endpoint serves both frontend and backoffice applications. No authentication required.
Endpoint: GET /api/v1/text-to-speech/{{id}}
Authentication: Not required (public endpoint)
Path Parameters:
id(string, required) - Submission UUID
Purpose: Universal endpoint that returns full submission details for both public playback and administrative views.
Use Cases:
- Frontend Public:
/play/[id]route for audio playback by ID - Frontend Client: "My Requests" details page (authenticated users viewing their requests)
- Frontend Backoffice: Admin request details page (admins viewing any request)
- Returns: Complete submission data including user info, audio URL, status, and metadata
Why Public?
- UUIDs are unguessable (e.g.,
550e8400-e29b-41d4-a716-446655440000) - Single endpoint serves multiple frontend use cases
- Simplifies frontend architecture (no need for separate authenticated endpoint)
Request Example:
GET /api/v1/text-to-speech/550e8400-e29b-41d4-a716-446655440000
Response: 200 OK
{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
}
Error Response: 404 Not Found (Submission not found)
{
"success": false,
"error": {
"code": "SUBMISSION_NOT_FOUND",
"message": "Submission with ID '550e8400-e29b-41d4-a716-446655440000' not found."
}
}
Testing with cURL
# Get submission by ID (no authentication required)
curl -X GET "http://localhost:5000/api/v1/text-to-speech/550e8400-e29b-41d4-a716-446655440000"
# Get submission by slug (minimal data)
curl -X GET "http://localhost:5000/api/v1/play/happy-birthday-john"
Expected Response (GET by ID - full details):
{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
}
Expected Response (GET by slug - minimal data):
{
"success": true,
"data": {
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"text": "Welcome to our audio service.",
"createdAt": "2024-01-22T14:30:00Z"
}
}
File Upload Strategy: S3 Pre-Signed URLs
Audio files can be uploaded directly to S3 using pre-signed URLs for better performance and scalability.
Why Pre-Signed URLs?
Benefits:
- Direct to S3: Files upload directly to S3, bypassing backend
- Faster uploads: No backend bottleneck
- Scalable: Backend doesn't handle large file streams
- Secure: Pre-signed URLs expire after 15 minutes
- Progress tracking: Frontend can show upload progress
Upload Flow:
- Request pre-signed URL from backend
- Upload file directly to S3 using the pre-signed URL
- Update submission with the S3 URL
Get Pre-Signed Upload URL
Endpoint: POST /api/v1/text-to-speech/upload-url
Description: Generates a pre-signed URL for uploading audio files directly to S3.
Headers:
Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json
Request Body:
{
"fileName": "custom-audio.mp3",
"fileType": "audio/mpeg",
"fileSize": 3145728
}
Request Body Fields:
fileName(string, required) - Original filename with extensionfileType(string, required) - MIME type (must beaudio/mpegfor MP3)fileSize(number, required) - File size in bytes (max 10MB = 10485760 bytes)
Response: 200 OK
{
"success": true,
"data": {
"uploadUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...",
"fileUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
"fileKey": "submissions/550e8400-custom.mp3",
"expiresIn": 900
},
"message": "Pre-signed URL generated successfully. Upload expires in 15 minutes."
}
Note: The file path includes the /submissions folder prefix for organizing TTS request audio files separately from voice gallery samples.
Response Fields:
uploadUrl- Pre-signed URL for uploading (use this with PUT request)fileUrl- Final S3 URL after upload completes (use this when updating submission)fileKey- S3 object keyexpiresIn- Seconds until URL expires (900 = 15 minutes)
Error Response: 400 Bad Request (Invalid file type)
{
"success": false,
"error": {
"code": "INVALID_FILE_TYPE",
"message": "Only MP3 files are allowed. Received: audio/wav"
}
}
Error Response: 400 Bad Request (File too large)
{
"success": false,
"error": {
"code": "FILE_TOO_LARGE",
"message": "File size 12582912 bytes exceeds maximum of 10485760 bytes (10MB)"
}
}
Upload File to S3 (Client-Side)
After receiving the pre-signed URL, upload the file directly to S3:
Request: PUT {{uploadUrl}}
Headers:
Content-Type: audio/mpeg
Body: Binary audio file data
Testing with cURL:
# Step 1: Get pre-signed URL
curl -X POST "http://localhost:5000/api/v1/text-to-speech/upload-url" \
-H "Authorization: Bearer {{access-token}}" \
-H "Content-Type: application/json" \
-d '{{
"fileName": "custom-audio.mp3",
"fileType": "audio/mpeg",
"fileSize": 3145728
}}'
# Step 2: Upload file to S3 (use uploadUrl from previous response)
curl -X PUT "{{upload-url-from-previous-response}}" \
-H "Content-Type: audio/mpeg" \
--data-binary "@/path/to/custom-audio.mp3"
# Step 3: Update submission with the fileUrl
curl -X PUT "http://localhost:5000/api/v1/text-to-speech/{{id}}" \
-H "Authorization: Bearer {{access-token}}" \
-H "Content-Type: application/json" \
-d '{{
"audioUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
"status": "completed"
}}'
Update Submission
Endpoint: PUT /api/v1/text-to-speech/{{id}}
Path Parameters:
id(string, required) - Submission UUID
Headers:
Authorization: Bearer {{access-token}}
Content-Type: application/json
Request Body (JSON):
{
"voiceId": "bella-voice-id-789",
"audioUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
"status": "completed"
}
The text field cannot be modified after submission. Once a request is created, the text content is read-only. This ensures data integrity and prevents confusion during processing.
Request Body Fields (all optional):
voiceId(string, optional) - Selected voice model IDaudioUrl(string, optional) - S3 URL of uploaded audio filestatus(string, optional) - Submission status: "pending", "processing", "completed"
Processing Flow:
- Validate request fields
- If
voiceIdprovided, verify it exists in Voice Gallery - If
audioUrlprovided, update audio file URL (see File Upload Strategy for uploading files) - If status changes to
completed: Automatically generate unique slug from text content with timestamp/UUID suffix - Automatically update
updatedAttimestamp - Update submission record
- Return updated submission entity (includes generated slug if status is
completed)
Note: Only the submission owner or admin users can update submissions.
To upload audio files, use the S3 Pre-Signed URL approach. This allows you to upload files directly to S3, then update the submission with the audioUrl.
Response: 200 OK
{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Updated welcome message for our audio service.",
"voiceId": "bella-voice-id-789",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 48,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
}
Error Response: 404 Not Found
{
"success": false,
"error": {
"code": "SUBMISSION_NOT_FOUND",
"message": "Submission with ID '550e8400-e29b-41d4-a716-446655440000' not found."
}
}
Error Response: 403 Forbidden
{
"success": false,
"error": {
"code": "FORBIDDEN",
"message": "You do not have permission to update this submission."
}
}
Error Response: 400 Bad Request (Invalid Voice ID)
{
"success": false,
"error": {
"code": "VOICE_NOT_FOUND",
"message": "Voice with ID 'invalid-voice-id' not found or not published."
}
}
Get Submission by Slug (Public)
Endpoint: GET /api/v1/play/{{slug}}
Path Parameters:
slug(string, required) - Unique URL-safe slug
Authentication: Not required (public endpoint)
Purpose: Optimized endpoint for public playback pages that only need minimal data for audio playback.
Use Cases:
- Frontend:
/play/[slug]route for shareable audio links - Returns limited information for privacy (no user data, no internal IDs)
- Use GET /api/v1/text-to-speech/{id} for full submission details (frontend dashboards, backoffice)
- Use GET /api/v1/play/{slug} for minimal playback data (shareable public links)
Both endpoints are public and serve the same audio, but return different levels of detail.
Request Example:
GET /api/v1/play/happy-birthday-john
Response: 200 OK
{
"success": true,
"data": {
"slug": "happy-birthday-john",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/happy-birthday-john.mp3",
"text": "Happy birthday John! Wishing you all the best on your special day.",
"createdAt": "2024-01-22T14:30:00Z"
}
}
Note: This endpoint returns limited information (no user data, no internal IDs) for public playback, while GET /api/v1/text-to-speech/{id} returns complete submission details.
Get All Submissions (Admin)
This endpoint is available to both clients and admins, but returns different results:
- Admins: See all submissions from all users
- Clients: Automatically filtered to show only their own submissions
Endpoint: GET /api/v1/text-to-speech
Authentication: Required (Client or Admin role)
Headers:
Authorization: Bearer {{admin-access-token}}
Content-Type: application/json
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
page | integer | No | 1 | Page number for pagination |
pageSize | integer | No | 20 | Number of items per page (max 100) |
status | string | No | all | Filter by status: pending, processing, completed, all |
clientId | string | No | - | Filter by specific client ID |
sortBy | string | No | createdAt | Sort field: createdAt, characterCount, processingTime |
sortOrder | string | No | desc | Sort order: asc or desc |
search | string | No | - | Search in text content or slug |
Request Example:
# Get all submissions (first page)
curl -X GET "http://localhost:5000/api/v1/text-to-speech" \
-H "Authorization: Bearer {{admin-access-token}}"
# Get pending submissions only
curl -X GET "http://localhost:5000/api/v1/text-to-speech?status=pending" \
-H "Authorization: Bearer {{admin-access-token}}"
# Get page 2 with 50 items per page, sorted by character count
curl -X GET "http://localhost:5000/api/v1/text-to-speech?page=2&pageSize=50&sortBy=characterCount&sortOrder=desc" \
-H "Authorization: Bearer {{admin-access-token}}"
# Search submissions containing "welcome"
curl -X GET "http://localhost:5000/api/v1/text-to-speech?search=welcome" \
-H "Authorization: Bearer {{admin-access-token}}"
# Filter by client ID
curl -X GET "http://localhost:5000/api/v1/text-to-speech?clientId=client-abc-123" \
-H "Authorization: Bearer {{admin-access-token}}"
Response: 200 OK
{
"success": true,
"data": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
},
{
"id": "660e8400-e29b-41d4-a716-446655440001",
"text": "Happy birthday Sarah! Wishing you a wonderful year ahead.",
"voiceId": "adam-voice-id-456",
"clientId": "client-def-456",
"slug": "happy-birthday-sarah-1706882400789",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/happy-birthday-sarah-1706882400789.mp3",
"status": "completed",
"characterCount": 58,
"processingTime": 4200,
"createdAt": "2024-01-22T14:25:00Z",
"createdBy": {
"userId": "user-id-789",
"userName": "Jane Smith"
}
},
{
"id": "770e8400-e29b-41d4-a716-446655440002",
"text": "This is a test message for the audio service.",
"voiceId": "bella-voice-id-789",
"clientId": "client-abc-123",
"slug": "test-message-audio-1706882500123",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/test-message-audio-1706882500123.mp3",
"status": "pending",
"characterCount": 46,
"processingTime": 0,
"createdAt": "2024-01-22T14:20:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
],
"pagination": {
"currentPage": 1,
"pageSize": 20,
"totalItems": 3,
"totalPages": 1,
"hasNextPage": false,
"hasPreviousPage": false
},
"filters": {
"status": "all",
"sortBy": "createdAt",
"sortOrder": "desc"
}
}
Response Fields:
data(array) - Array of text-to-speech submission entitiespagination(object) - Pagination metadatacurrentPage(integer) - Current page numberpageSize(integer) - Items per pagetotalItems(integer) - Total number of submissions matching filterstotalPages(integer) - Total number of pageshasNextPage(boolean) - Whether there's a next pagehasPreviousPage(boolean) - Whether there's a previous page
filters(object) - Applied filters and sorting
Access Control Behavior:
- Admin users: Return all submissions from all clients (no filtering)
- Client users: Automatically filter by
clientIdto show only their own submissions - Authentication is required for both roles
Error Response: 400 Bad Request (Invalid parameters)
{
"success": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid query parameters",
"fields": {
"pageSize": ["Page size must be between 1 and 100"],
"status": ["Status must be one of: pending, processing, completed, all"]
}
}
}
Notes:
- Submissions are sorted by
createdAt(newest first) by default - Admins can view submissions from all clients
- Clients automatically see only their own submissions (filtered by
clientIdfrom JWT token) - Maximum page size is 100 to prevent performance issues
Validation Rules
Text
- Required: Yes
- Type: String
- Min Length: 1 character
- Max Length: 1000 characters
- Pattern: All printable characters allowed
- Error Messages:
- Empty: "Text is required"
- Too short: "Text must be at least 1 character"
- Too long: "Text must not exceed 1000 characters"
Voice ID
- Required: Yes
- Type: String (UUID format)
- Validation: Must exist in Voice Gallery and be published
- Error Messages:
- Missing: "Voice ID is required"
- Invalid: "Voice with ID 'xxx' not found or not published"
Client ID
- Auto-Populated: Yes (from JWT token, NOT sent in request)
- Type: String
- Source: Extracted from authenticated user's JWT token
Slug
- Auto-Generated: Yes (server-side, generated when status changes to
completed) - Type: String | Null
- Initial Value: NULL (when submission is created)
- Generation: Automatically created when status is updated to
completed- Generated from text content (URL-safe) with timestamp or UUID suffix (e.g.,
welcome-audio-123456789) - Once generated, the slug never changes - it remains stable for the lifetime of the submission
- Used for shareable playback links (e.g.,
https://micdots.com/play/{{slug}})
- Generated from text content (URL-safe) with timestamp or UUID suffix (e.g.,
- Uniqueness: Must be unique across all submissions
- Pattern: Lowercase letters, numbers, and hyphens only
- Max Length: 200 characters
Important: The slug is automatically generated when the submission status changes to completed. This ensures every completed submission has a permanent, shareable playback link.
Dependencies
AWS S3 Storage
Audio files are stored in AWS S3.
S3 Bucket Configuration:
- Audio Bucket:
micdots-audio(single bucket for all audio files) - Region: us-east-1
- Access: Public read for audio files
- Audio Format: MP3
- Folder Structure:
/voice-samples- Voice gallery audio samples/submissions- User TTS request audio files
- File Naming:
{{folder}}/{{slug}}.mp3- The slug already contains uniqueness (timestamp/UUID suffix)
- Example:
submissions/welcome-audio-123456789.mp3 - Example:
voice-samples/rachel-sample-001.mp3
The same S3 bucket is used for both voice gallery samples and user submissions, organized in separate folders for better management and access control.
Integration Flow
Epic 1 Flow (Manual Processing)
Epic 1 Characteristics:
- Manual audio processing by admin
- Initial status:
pending(slug is NULL) - Admin obtains audio from any external source
- Admin uploads audio file to S3 using pre-signed URLs
- Admin updates submission with
audioUrland statuscompleted - Slug is auto-generated when status changes to
completed - Single S3 bucket (
micdots-audio)
In future releases, the flow will be automated with TTS API integration for immediate audio generation.
Security Considerations
Authorization
- Create submissions (POST): Requires authentication
- Update submissions (PUT): Requires authentication (only owner or admin)
- Get submission by ID (GET /api/v1/text-to-speech/{id}): Public (no auth) - UUIDs provide security through obscurity
- Get submission by slug (GET /api/v1/play/{slug}): Public (no auth) - designed for sharing
- List submissions (GET /api/v1/text-to-speech): Requires authentication, role-based filtering:
- Clients: Only see their own submissions (filtered by
clientId) - Admins: See all submissions from all clients
- Clients: Only see their own submissions (filtered by
Input Validation
- Sanitize all text inputs to prevent XSS attacks
- Validate voice ID exists and is published
- Limit request rate to prevent abuse (10 requests per minute per user)
Rate Limiting
- Create (POST): 10 requests per minute per authenticated user
- Update (PUT): 20 requests per minute per authenticated user
- Get by ID (GET /api/v1/text-to-speech/{id}): 1000 requests per minute (global, public endpoint)
- Get by Slug (GET /api/v1/play/{slug}): 1000 requests per minute (global, public endpoint)
- List submissions (GET /api/v1/text-to-speech): 100 requests per minute per authenticated user
Data Privacy
- GET /api/v1/text-to-speech/{id}: Returns full submission details including user info (createdBy)
- UUIDs provide security through obscurity (unguessable)
- Consider user consent for data sharing
- GET /api/v1/play/{slug}: Minimal data endpoint (no user info, no internal IDs)
- Only returns slug, audio URL, text, and timestamp
- Designed for public sharing without privacy concerns
Related Documentation
Future Enhancements (Not in MVP)
The following features are planned for future releases but are NOT included in Epic 1 MVP:
- Text translation before speech generation
- Multiple language support
- Custom voice cloning
- Batch text-to-speech processing
- Audio editing and effects
- Voice speed and pitch controls
- SSML (Speech Synthesis Markup Language) support
- Analytics tracking for audio playback