Text-to-Speech Submission API

Text-to-speech submission system for converting user text input into AI-generated audio with shareable links.

BASE API ENDPOINT

/api/v1/text-to-speech

Epic 1 - TTS Submission and Get Link

Epic 1 provides manual TTS processing and shareable audio links:

User submits text + voice selection
Admin manually processes request (obtains audio from external source)
Admin uploads audio file to S3
User receives shareable link after admin completes processing
QR code generation deferred to future releases

User Flow: Submit Text → Manual Admin Processing → Admin Uploads Audio → Get Shareable Link

Text-to-Speech Submission Entity

Entity Schema

interface SubmissionEntity {
  id: string; // UUID - Submission identifier
  text: string; // User input text (max 1000 characters)
  voiceId: string; // Selected voice ID from Voice Gallery
  clientId: string; // Client identifier (auto-populated from JWT token)
  slug: string | null; // Unique URL-safe identifier (NULL until status is 'completed')
  audioUrl: string | null; // S3 URL of generated audio file (NULL until uploaded)
  status: string; // Submission status: "pending", "processing", "completed"
  characterCount: number; // Number of characters in text
  processingTime: number; // Time taken to generate (in milliseconds)
  createdAt: string; // ISO 8601 timestamp - Submission creation time
  createdBy: {
    userId: string; // User ID from Microsoft Identity
    userName: string; // User display name
  };
}

Field Descriptions:

id: Unique identifier (UUID) for the submission
text: User's input text for audio generation (max 1000 characters)
voiceId: ID of the selected voice from Voice Gallery
clientId: Client identifier (automatically populated from JWT token)
slug: Unique URL-safe identifier for sharing the submission (NULL until status is completed)
audioUrl: S3 URL of the generated audio file (NULL until uploaded)
status: Current submission status ("pending", "processing", "completed")
characterCount: Number of characters in the input text
processingTime: Time taken to generate audio (in milliseconds)
createdAt: Timestamp when the submission was created (ISO 8601 format)
createdBy: Information about the user who created the submission
- userId: User ID from Microsoft Identity
- userName: User's display name

Notes:

Audio files are manually generated by admin and uploaded to S3 (Epic 1)
Slug is auto-generated when status changes to completed - generated from text content (URL-safe, unique) with timestamp/UUID suffix
Once generated, the slug never changes - provides stable shareable links
Playback URL format: https://micdots.com/play/{{slug}}
S3 filenames are based on the slug: {{slug}}.mp3 (same for both normal and dummy modes)
clientId and createdBy are automatically populated from the authenticated user's JWT token (NOT sent in request)
Processing time: Varies based on manual admin processing

Entity Example

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "text": "Welcome to our audio service. Listen to this message.",
  "voiceId": "rachel-voice-id-123",
  "clientId": "client-abc-123",
  "slug": "welcome-audio-service-1706882400123",
  "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882400123.mp3",
  "status": "completed",
  "characterCount": 54,
  "processingTime": 5420,
  "createdAt": "2024-01-22T14:30:00Z",
  "createdBy": {
    "userId": "user-id-456",
    "userName": "John Doe"
  }
}

Notes:

The slug includes a timestamp suffix (1706882400123) to ensure uniqueness across all submissions
Once generated, the slug never changes

Form Fields Reference

The following fields are used in the submission form:

Field Name	Type	Required	Description
`text`	string	Yes	User's text input (1-1000 characters)
`voiceId`	string	Yes	Selected voice model ID from Voice Gallery

Auto-populated fields (NOT sent in request):

clientId - Automatically extracted from the authenticated user's JWT token
createdBy - Automatically populated with user ID and name from JWT token

API Endpoint

Submit Text-to-Speech Request

MVP 1 Feature

This endpoint is available in MVP 1 for authenticated users to generate audio QR codes.

Endpoint: POST /api/v1/text-to-speech

Headers:

Authorization: Bearer {{access-token}}
Content-Type: application/json

Request Body (JSON):

{
  "text": "Welcome to our audio service. Listen to this message.",
  "voiceId": "rachel-voice-id-123"
}

Form Fields: See Form Fields Reference above for complete field documentation.

Processing Flow (Epic 1 - Manual):

Validate text length and voice ID
Automatically populate clientId and createdBy from the authenticated user's JWT token (NOT sent in request)
Create submission record with status pending (slug is NULL initially)
Return submission entity (without audio URL or slug initially)

Note: At this point, the user receives a submission confirmation and must wait while the admin processes the request.
Admin processes manually:
- Admin generates audio file offline from external source
- Admin requests pre-signed URL: POST /api/v1/text-to-speech/upload-url
- Admin uploads MP3 file directly to S3
- Admin updates submission: PUT /api/v1/text-to-speech/{{id}} with audioUrl and status completed
System automatically generates unique slug when status changes to completed:
- Generated from text content with timestamp/UUID suffix (e.g., welcome-audio-123456789)
- Slug is permanent and never changes after generation
- Playback URL created: https://micdots.com/play/{{slug}}

Important: The slug is generated automatically when the submission status is updated to completed. This ensures every completed submission has a shareable playback link.

Future Automation

In future releases, the processing flow will be automated:

Automated text-to-speech generation (via TTS API)
Audio automatically uploaded to S3
Immediate completion (no manual admin processing)

Note: clientId and createdBy are NOT sent in the request. The API automatically extracts the client ID, user ID, and user name from the authenticated user's JWT token and populates these fields.

Response: 201 Created

{
  "success": true,
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "text": "Welcome to our audio service. Listen to this message.",
    "voiceId": "rachel-voice-id-123",
    "clientId": "client-abc-123",
    "slug": "welcome-audio-service-1706882300456",
    "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
    "status": "completed",
    "characterCount": 54,
    "processingTime": 5420,
    "createdAt": "2024-01-22T14:30:00Z",
    "createdBy": {
      "userId": "user-id-456",
      "userName": "John Doe"
    }
  },
  "links": {
    "share": "https://micdots.com/play/welcome-audio-service-1706882300456",
    "audio": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3"
  }
}

Error Response: 400 Bad Request (Validation Error)

{
  "success": false,
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Validation failed",
    "fields": {
      "text": ["Text must be between 1 and 1000 characters"],
      "voiceId": ["Voice ID is required"]
    }
  }
}

Error Response: 404 Not Found (Invalid Voice ID)

{
  "success": false,
  "error": {
    "code": "VOICE_NOT_FOUND",
    "message": "Voice with ID 'invalid-voice-id' not found or not published."
  }
}

Error Response: 500 Internal Server Error (Processing Failed)

{
  "success": false,
  "error": {
    "code": "PROCESSING_FAILED",
    "message": "Failed to process request. Please try again."
  }
}

Get Submission by ID

Universal Endpoint

This public endpoint serves both frontend and backoffice applications. No authentication required.

Endpoint: GET /api/v1/text-to-speech/{{id}}

Authentication: Not required (public endpoint)

Path Parameters:

id (string, required) - Submission UUID

Purpose: Universal endpoint that returns full submission details for both public playback and administrative views.

Use Cases:

Frontend Public: /play/[id] route for audio playback by ID
Frontend Client: "My Requests" details page (authenticated users viewing their requests)
Frontend Backoffice: Admin request details page (admins viewing any request)
Returns: Complete submission data including user info, audio URL, status, and metadata

Why Public?

UUIDs are unguessable (e.g., 550e8400-e29b-41d4-a716-446655440000)
Single endpoint serves multiple frontend use cases
Simplifies frontend architecture (no need for separate authenticated endpoint)

Request Example:

GET /api/v1/text-to-speech/550e8400-e29b-41d4-a716-446655440000

Response: 200 OK

{
  "success": true,
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "text": "Welcome to our audio service. Listen to this message.",
    "voiceId": "rachel-voice-id-123",
    "clientId": "client-abc-123",
    "slug": "welcome-audio-service-1706882300456",
    "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
    "status": "completed",
    "characterCount": 54,
    "processingTime": 5420,
    "createdAt": "2024-01-22T14:30:00Z",
    "createdBy": {
      "userId": "user-id-456",
      "userName": "John Doe"
    }
  }
}

Error Response: 404 Not Found (Submission not found)

{
  "success": false,
  "error": {
    "code": "SUBMISSION_NOT_FOUND",
    "message": "Submission with ID '550e8400-e29b-41d4-a716-446655440000' not found."
  }
}

Testing with cURL

# Get submission by ID (no authentication required)
curl -X GET "http://localhost:5000/api/v1/text-to-speech/550e8400-e29b-41d4-a716-446655440000"

# Get submission by slug (minimal data)
curl -X GET "http://localhost:5000/api/v1/play/happy-birthday-john"

Expected Response (GET by ID - full details):

{
  "success": true,
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "text": "Welcome to our audio service.",
    "voiceId": "rachel-voice-id-123",
    "clientId": "client-abc-123",
    "slug": "welcome-audio-service-1706882300456",
    "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
    "status": "completed",
    "characterCount": 54,
    "processingTime": 5420,
    "createdAt": "2024-01-22T14:30:00Z",
    "createdBy": {
      "userId": "user-id-456",
      "userName": "John Doe"
    }
  }
}

Expected Response (GET by slug - minimal data):

{
  "success": true,
  "data": {
    "slug": "welcome-audio-service-1706882300456",
    "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
    "text": "Welcome to our audio service.",
    "createdAt": "2024-01-22T14:30:00Z"
  }
}

File Upload Strategy: S3 Pre-Signed URLs

Audio files can be uploaded directly to S3 using pre-signed URLs for better performance and scalability.

Why Pre-Signed URLs?

Benefits:

Direct to S3: Files upload directly to S3, bypassing backend
Faster uploads: No backend bottleneck
Scalable: Backend doesn't handle large file streams
Secure: Pre-signed URLs expire after 15 minutes
Progress tracking: Frontend can show upload progress

Upload Flow:

Request pre-signed URL from backend
Upload file directly to S3 using the pre-signed URL
Update submission with the S3 URL

Get Pre-Signed Upload URL

Endpoint: POST /api/v1/text-to-speech/upload-url

Description: Generates a pre-signed URL for uploading audio files directly to S3.

Headers:

Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

Request Body:

{
  "fileName": "custom-audio.mp3",
  "fileType": "audio/mpeg",
  "fileSize": 3145728
}

Request Body Fields:

fileName (string, required) - Original filename with extension
fileType (string, required) - MIME type (must be audio/mpeg for MP3)
fileSize (number, required) - File size in bytes (max 10MB = 10485760 bytes)

Response: 200 OK

{
  "success": true,
  "data": {
    "uploadUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...",
    "fileUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
    "fileKey": "submissions/550e8400-custom.mp3",
    "expiresIn": 900
  },
  "message": "Pre-signed URL generated successfully. Upload expires in 15 minutes."
}

Note: The file path includes the /submissions folder prefix for organizing TTS request audio files separately from voice gallery samples.

Response Fields:

uploadUrl - Pre-signed URL for uploading (use this with PUT request)
fileUrl - Final S3 URL after upload completes (use this when updating submission)
fileKey - S3 object key
expiresIn - Seconds until URL expires (900 = 15 minutes)

Error Response: 400 Bad Request (Invalid file type)

{
  "success": false,
  "error": {
    "code": "INVALID_FILE_TYPE",
    "message": "Only MP3 files are allowed. Received: audio/wav"
  }
}

Error Response: 400 Bad Request (File too large)

{
  "success": false,
  "error": {
    "code": "FILE_TOO_LARGE",
    "message": "File size 12582912 bytes exceeds maximum of 10485760 bytes (10MB)"
  }
}

Upload File to S3 (Client-Side)

After receiving the pre-signed URL, upload the file directly to S3:

Request: PUT {{uploadUrl}}

Headers:

Content-Type: audio/mpeg

Body: Binary audio file data

Testing with cURL:

# Step 1: Get pre-signed URL
curl -X POST "http://localhost:5000/api/v1/text-to-speech/upload-url" \
  -H "Authorization: Bearer {{access-token}}" \
  -H "Content-Type: application/json" \
  -d '{{
    "fileName": "custom-audio.mp3",
    "fileType": "audio/mpeg",
    "fileSize": 3145728
  }}'

# Step 2: Upload file to S3 (use uploadUrl from previous response)
curl -X PUT "{{upload-url-from-previous-response}}" \
  -H "Content-Type: audio/mpeg" \
  --data-binary "@/path/to/custom-audio.mp3"

# Step 3: Update submission with the fileUrl
curl -X PUT "http://localhost:5000/api/v1/text-to-speech/{{id}}" \
  -H "Authorization: Bearer {{access-token}}" \
  -H "Content-Type: application/json" \
  -d '{{
    "audioUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
    "status": "completed"
  }}'

Update Submission

Endpoint: PUT /api/v1/text-to-speech/{{id}}

Path Parameters:

id (string, required) - Submission UUID

Headers:

Authorization: Bearer {{access-token}}
Content-Type: application/json

Request Body (JSON):

{
  "voiceId": "bella-voice-id-789",
  "audioUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
  "status": "completed"
}

Text Field is Read-Only

The text field cannot be modified after submission. Once a request is created, the text content is read-only. This ensures data integrity and prevents confusion during processing.

Request Body Fields (all optional):

voiceId (string, optional) - Selected voice model ID
audioUrl (string, optional) - S3 URL of uploaded audio file
status (string, optional) - Submission status: "pending", "processing", "completed"

Processing Flow:

Validate request fields
If voiceId provided, verify it exists in Voice Gallery
If audioUrl provided, update audio file URL (see File Upload Strategy for uploading files)
If status changes to completed: Automatically generate unique slug from text content with timestamp/UUID suffix
Automatically update updatedAt timestamp
Update submission record
Return updated submission entity (includes generated slug if status is completed)

Note: Only the submission owner or admin users can update submissions.

Upload Audio Files

To upload audio files, use the S3 Pre-Signed URL approach. This allows you to upload files directly to S3, then update the submission with the audioUrl.

Response: 200 OK

{
  "success": true,
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "text": "Updated welcome message for our audio service.",
    "voiceId": "bella-voice-id-789",
    "clientId": "client-abc-123",
    "slug": "welcome-audio-service-1706882300456",
    "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
    "status": "completed",
    "characterCount": 48,
    "processingTime": 5420,
    "createdAt": "2024-01-22T14:30:00Z",
    "createdBy": {
      "userId": "user-id-456",
      "userName": "John Doe"
    }
  }
}

Error Response: 404 Not Found

{
  "success": false,
  "error": {
    "code": "SUBMISSION_NOT_FOUND",
    "message": "Submission with ID '550e8400-e29b-41d4-a716-446655440000' not found."
  }
}

Error Response: 403 Forbidden

{
  "success": false,
  "error": {
    "code": "FORBIDDEN",
    "message": "You do not have permission to update this submission."
  }
}

Error Response: 400 Bad Request (Invalid Voice ID)

{
  "success": false,
  "error": {
    "code": "VOICE_NOT_FOUND",
    "message": "Voice with ID 'invalid-voice-id' not found or not published."
  }
}

Get Submission by Slug (Public)

Endpoint: GET /api/v1/play/{{slug}}

Path Parameters:

slug (string, required) - Unique URL-safe slug

Authentication: Not required (public endpoint)

Purpose: Optimized endpoint for public playback pages that only need minimal data for audio playback.

Use Cases:

Frontend: /play/[slug] route for shareable audio links
Returns limited information for privacy (no user data, no internal IDs)

ID vs Slug

Use GET /api/v1/text-to-speech/{id} for full submission details (frontend dashboards, backoffice)
Use GET /api/v1/play/{slug} for minimal playback data (shareable public links)

Both endpoints are public and serve the same audio, but return different levels of detail.

Request Example:

GET /api/v1/play/happy-birthday-john

Response: 200 OK

{
  "success": true,
  "data": {
    "slug": "happy-birthday-john",
    "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/happy-birthday-john.mp3",
    "text": "Happy birthday John! Wishing you all the best on your special day.",
    "createdAt": "2024-01-22T14:30:00Z"
  }
}

Note: This endpoint returns limited information (no user data, no internal IDs) for public playback, while GET /api/v1/text-to-speech/{id} returns complete submission details.

Get All Submissions (Admin)

Access Control

This endpoint is available to both clients and admins, but returns different results:

Admins: See all submissions from all users
Clients: Automatically filtered to show only their own submissions

Endpoint: GET /api/v1/text-to-speech

Authentication: Required (Client or Admin role)

Headers:

Authorization: Bearer {{admin-access-token}}
Content-Type: application/json

Query Parameters:

Parameter	Type	Required	Default	Description
`page`	integer	No	1	Page number for pagination
`pageSize`	integer	No	20	Number of items per page (max 100)
`status`	string	No	all	Filter by status: `pending`, `processing`, `completed`, `all`
`clientId`	string	No	-	Filter by specific client ID
`sortBy`	string	No	createdAt	Sort field: `createdAt`, `characterCount`, `processingTime`
`sortOrder`	string	No	desc	Sort order: `asc` or `desc`
`search`	string	No	-	Search in text content or slug

Request Example:

# Get all submissions (first page)
curl -X GET "http://localhost:5000/api/v1/text-to-speech" \
  -H "Authorization: Bearer {{admin-access-token}}"

# Get pending submissions only
curl -X GET "http://localhost:5000/api/v1/text-to-speech?status=pending" \
  -H "Authorization: Bearer {{admin-access-token}}"

# Get page 2 with 50 items per page, sorted by character count
curl -X GET "http://localhost:5000/api/v1/text-to-speech?page=2&pageSize=50&sortBy=characterCount&sortOrder=desc" \
  -H "Authorization: Bearer {{admin-access-token}}"

# Search submissions containing "welcome"
curl -X GET "http://localhost:5000/api/v1/text-to-speech?search=welcome" \
  -H "Authorization: Bearer {{admin-access-token}}"

# Filter by client ID
curl -X GET "http://localhost:5000/api/v1/text-to-speech?clientId=client-abc-123" \
  -H "Authorization: Bearer {{admin-access-token}}"

Response: 200 OK

{
  "success": true,
  "data": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "text": "Welcome to our audio service. Listen to this message.",
      "voiceId": "rachel-voice-id-123",
      "clientId": "client-abc-123",
      "slug": "welcome-audio-service-1706882300456",
      "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
      "status": "completed",
      "characterCount": 54,
      "processingTime": 5420,
      "createdAt": "2024-01-22T14:30:00Z",
      "createdBy": {
        "userId": "user-id-456",
        "userName": "John Doe"
      }
    },
    {
      "id": "660e8400-e29b-41d4-a716-446655440001",
      "text": "Happy birthday Sarah! Wishing you a wonderful year ahead.",
      "voiceId": "adam-voice-id-456",
      "clientId": "client-def-456",
      "slug": "happy-birthday-sarah-1706882400789",
      "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/happy-birthday-sarah-1706882400789.mp3",
      "status": "completed",
      "characterCount": 58,
      "processingTime": 4200,
      "createdAt": "2024-01-22T14:25:00Z",
      "createdBy": {
        "userId": "user-id-789",
        "userName": "Jane Smith"
      }
    },
    {
      "id": "770e8400-e29b-41d4-a716-446655440002",
      "text": "This is a test message for the audio service.",
      "voiceId": "bella-voice-id-789",
      "clientId": "client-abc-123",
      "slug": "test-message-audio-1706882500123",
      "audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/test-message-audio-1706882500123.mp3",
      "status": "pending",
      "characterCount": 46,
      "processingTime": 0,
      "createdAt": "2024-01-22T14:20:00Z",
      "createdBy": {
        "userId": "user-id-456",
        "userName": "John Doe"
      }
    }
  ],
  "pagination": {
    "currentPage": 1,
    "pageSize": 20,
    "totalItems": 3,
    "totalPages": 1,
    "hasNextPage": false,
    "hasPreviousPage": false
  },
  "filters": {
    "status": "all",
    "sortBy": "createdAt",
    "sortOrder": "desc"
  }
}

Response Fields:

data (array) - Array of text-to-speech submission entities
pagination (object) - Pagination metadata
- currentPage (integer) - Current page number
- pageSize (integer) - Items per page
- totalItems (integer) - Total number of submissions matching filters
- totalPages (integer) - Total number of pages
- hasNextPage (boolean) - Whether there's a next page
- hasPreviousPage (boolean) - Whether there's a previous page
filters (object) - Applied filters and sorting

Access Control Behavior:

Admin users: Return all submissions from all clients (no filtering)
Client users: Automatically filter by clientId to show only their own submissions
Authentication is required for both roles

Error Response: 400 Bad Request (Invalid parameters)

{
  "success": false,
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid query parameters",
    "fields": {
      "pageSize": ["Page size must be between 1 and 100"],
      "status": ["Status must be one of: pending, processing, completed, all"]
    }
  }
}

Notes:

Submissions are sorted by createdAt (newest first) by default
Admins can view submissions from all clients
Clients automatically see only their own submissions (filtered by clientId from JWT token)
Maximum page size is 100 to prevent performance issues

Validation Rules

Text

Required: Yes
Type: String
Min Length: 1 character
Max Length: 1000 characters
Pattern: All printable characters allowed
Error Messages:
- Empty: "Text is required"
- Too short: "Text must be at least 1 character"
- Too long: "Text must not exceed 1000 characters"

Voice ID

Required: Yes
Type: String (UUID format)
Validation: Must exist in Voice Gallery and be published
Error Messages:
- Missing: "Voice ID is required"
- Invalid: "Voice with ID 'xxx' not found or not published"

Client ID

Auto-Populated: Yes (from JWT token, NOT sent in request)
Type: String
Source: Extracted from authenticated user's JWT token

Slug

Auto-Generated: Yes (server-side, generated when status changes to completed)
Type: String | Null
Initial Value: NULL (when submission is created)
Generation: Automatically created when status is updated to completed
- Generated from text content (URL-safe) with timestamp or UUID suffix (e.g., welcome-audio-123456789)
- Once generated, the slug never changes - it remains stable for the lifetime of the submission
- Used for shareable playback links (e.g., https://micdots.com/play/{{slug}})
Uniqueness: Must be unique across all submissions
Pattern: Lowercase letters, numbers, and hyphens only
Max Length: 200 characters

Important: The slug is automatically generated when the submission status changes to completed. This ensures every completed submission has a permanent, shareable playback link.

Dependencies

AWS S3 Storage

Audio files are stored in AWS S3.

S3 Bucket Configuration:

Audio Bucket: micdots-audio (single bucket for all audio files)
Region: us-east-1
Access: Public read for audio files
Audio Format: MP3
Folder Structure:
- /voice-samples - Voice gallery audio samples
- /submissions - User TTS request audio files
File Naming: {{folder}}/{{slug}}.mp3
- The slug already contains uniqueness (timestamp/UUID suffix)
- Example: submissions/welcome-audio-123456789.mp3
- Example: voice-samples/rachel-sample-001.mp3

Single Bucket Architecture

The same S3 bucket is used for both voice gallery samples and user submissions, organized in separate folders for better management and access control.

Integration Flow

Epic 1 Flow (Manual Processing)

Epic 1 Characteristics:

Manual audio processing by admin
Initial status: pending (slug is NULL)
Admin obtains audio from any external source
Admin uploads audio file to S3 using pre-signed URLs
Admin updates submission with audioUrl and status completed
Slug is auto-generated when status changes to completed
Single S3 bucket (micdots-audio)

Future Automation

In future releases, the flow will be automated with TTS API integration for immediate audio generation.

Security Considerations

Authorization

Create submissions (POST): Requires authentication
Update submissions (PUT): Requires authentication (only owner or admin)
Get submission by ID (GET /api/v1/text-to-speech/{id}): Public (no auth) - UUIDs provide security through obscurity
Get submission by slug (GET /api/v1/play/{slug}): Public (no auth) - designed for sharing
List submissions (GET /api/v1/text-to-speech): Requires authentication, role-based filtering:
- Clients: Only see their own submissions (filtered by clientId)
- Admins: See all submissions from all clients

Input Validation

Sanitize all text inputs to prevent XSS attacks
Validate voice ID exists and is published
Limit request rate to prevent abuse (10 requests per minute per user)

Rate Limiting

Create (POST): 10 requests per minute per authenticated user
Update (PUT): 20 requests per minute per authenticated user
Get by ID (GET /api/v1/text-to-speech/{id}): 1000 requests per minute (global, public endpoint)
Get by Slug (GET /api/v1/play/{slug}): 1000 requests per minute (global, public endpoint)
List submissions (GET /api/v1/text-to-speech): 100 requests per minute per authenticated user

Data Privacy

GET /api/v1/text-to-speech/{id}: Returns full submission details including user info (createdBy)
- UUIDs provide security through obscurity (unguessable)
- Consider user consent for data sharing
GET /api/v1/play/{slug}: Minimal data endpoint (no user info, no internal IDs)
- Only returns slug, audio URL, text, and timestamp
- Designed for public sharing without privacy concerns

Future Enhancements (Not in MVP)

Future Features

The following features are planned for future releases but are NOT included in Epic 1 MVP:

Text translation before speech generation
Multiple language support
Custom voice cloning
Batch text-to-speech processing
Audio editing and effects
Voice speed and pitch controls
SSML (Speech Synthesis Markup Language) support
Analytics tracking for audio playback

Text-to-Speech Submission Entity​

Entity Schema​

Entity Example​

Form Fields Reference​

API Endpoint​

Submit Text-to-Speech Request​

Get Submission by ID​

Testing with cURL​

File Upload Strategy: S3 Pre-Signed URLs​

Why Pre-Signed URLs?​

Get Pre-Signed Upload URL​

Upload File to S3 (Client-Side)​

Update Submission​

Get Submission by Slug (Public)​

Get All Submissions (Admin)​

Validation Rules​

Text​

Voice ID​

Client ID​

Slug​

Dependencies​

AWS S3 Storage​

Integration Flow​

Epic 1 Flow (Manual Processing)​

Security Considerations​

Authorization​

Input Validation​

Rate Limiting​

Data Privacy​

Related Documentation​

Future Enhancements (Not in MVP)​

Text-to-Speech Submission Entity

Entity Schema

Entity Example

Form Fields Reference

API Endpoint

Submit Text-to-Speech Request

Get Submission by ID

Testing with cURL

File Upload Strategy: S3 Pre-Signed URLs

Why Pre-Signed URLs?

Get Pre-Signed Upload URL

Upload File to S3 (Client-Side)

Update Submission

Get Submission by Slug (Public)

Get All Submissions (Admin)

Validation Rules

Text

Voice ID

Client ID

Slug

Dependencies

AWS S3 Storage

Integration Flow

Epic 1 Flow (Manual Processing)

Security Considerations

Authorization

Input Validation

Rate Limiting

Data Privacy

Related Documentation

Future Enhancements (Not in MVP)