Skip to main content

Text-to-Speech Submission API

Text-to-speech submission system for converting user text input into AI-generated audio with shareable links.

BASE API ENDPOINT

/api/v1/text-to-speech

Epic 1 - TTS Submission and Get Link

Epic 1 provides manual TTS processing and shareable audio links:

  • User submits text + voice selection
  • Admin manually processes request (obtains audio from external source)
  • Admin uploads audio file to S3
  • User receives shareable link after admin completes processing
  • QR code generation deferred to future releases

User Flow: Submit Text → Manual Admin Processing → Admin Uploads Audio → Get Shareable Link


Text-to-Speech Submission Entity

Entity Schema

interface SubmissionEntity {
id: string; // UUID - Submission identifier
text: string; // User input text (max 1000 characters)
voiceId: string; // Selected voice ID from Voice Gallery
clientId: string; // Client identifier (auto-populated from JWT token)
slug: string | null; // Unique URL-safe identifier (NULL until status is 'completed')
audioUrl: string | null; // S3 URL of generated audio file (NULL until uploaded)
status: string; // Submission status: "pending", "processing", "completed"
characterCount: number; // Number of characters in text
processingTime: number; // Time taken to generate (in milliseconds)
createdAt: string; // ISO 8601 timestamp - Submission creation time
createdBy: {
userId: string; // User ID from Microsoft Identity
userName: string; // User display name
};
}

Field Descriptions:

  • id: Unique identifier (UUID) for the submission
  • text: User's input text for audio generation (max 1000 characters)
  • voiceId: ID of the selected voice from Voice Gallery
  • clientId: Client identifier (automatically populated from JWT token)
  • slug: Unique URL-safe identifier for sharing the submission (NULL until status is completed)
  • audioUrl: S3 URL of the generated audio file (NULL until uploaded)
  • status: Current submission status ("pending", "processing", "completed")
  • characterCount: Number of characters in the input text
  • processingTime: Time taken to generate audio (in milliseconds)
  • createdAt: Timestamp when the submission was created (ISO 8601 format)
  • createdBy: Information about the user who created the submission
    • userId: User ID from Microsoft Identity
    • userName: User's display name

Notes:

  • Audio files are manually generated by admin and uploaded to S3 (Epic 1)
  • Slug is auto-generated when status changes to completed - generated from text content (URL-safe, unique) with timestamp/UUID suffix
  • Once generated, the slug never changes - provides stable shareable links
  • Playback URL format: https://micdots.com/play/{{slug}}
  • S3 filenames are based on the slug: {{slug}}.mp3 (same for both normal and dummy modes)
  • clientId and createdBy are automatically populated from the authenticated user's JWT token (NOT sent in request)
  • Processing time: Varies based on manual admin processing

Entity Example

{
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882400123",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882400123.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}

Notes:

  • The slug includes a timestamp suffix (1706882400123) to ensure uniqueness across all submissions
  • Once generated, the slug never changes

Form Fields Reference

The following fields are used in the submission form:

Field NameTypeRequiredDescription
textstringYesUser's text input (1-1000 characters)
voiceIdstringYesSelected voice model ID from Voice Gallery

Auto-populated fields (NOT sent in request):

  • clientId - Automatically extracted from the authenticated user's JWT token
  • createdBy - Automatically populated with user ID and name from JWT token

API Endpoint

Submit Text-to-Speech Request

MVP 1 Feature

This endpoint is available in MVP 1 for authenticated users to generate audio QR codes.

Endpoint: POST /api/v1/text-to-speech

Headers:

Authorization: Bearer {{access-token}}
Content-Type: application/json

Request Body (JSON):

{
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123"
}

Form Fields: See Form Fields Reference above for complete field documentation.

Processing Flow (Epic 1 - Manual):

  1. Validate text length and voice ID

  2. Automatically populate clientId and createdBy from the authenticated user's JWT token (NOT sent in request)

  3. Create submission record with status pending (slug is NULL initially)

  4. Return submission entity (without audio URL or slug initially)

    Note: At this point, the user receives a submission confirmation and must wait while the admin processes the request.

  5. Admin processes manually:

    • Admin generates audio file offline from external source
    • Admin requests pre-signed URL: POST /api/v1/text-to-speech/upload-url
    • Admin uploads MP3 file directly to S3
    • Admin updates submission: PUT /api/v1/text-to-speech/{{id}} with audioUrl and status completed
  6. System automatically generates unique slug when status changes to completed:

    • Generated from text content with timestamp/UUID suffix (e.g., welcome-audio-123456789)
    • Slug is permanent and never changes after generation
    • Playback URL created: https://micdots.com/play/{{slug}}

Important: The slug is generated automatically when the submission status is updated to completed. This ensures every completed submission has a shareable playback link.

Future Automation

In future releases, the processing flow will be automated:

  • Automated text-to-speech generation (via TTS API)
  • Audio automatically uploaded to S3
  • Immediate completion (no manual admin processing)

Note: clientId and createdBy are NOT sent in the request. The API automatically extracts the client ID, user ID, and user name from the authenticated user's JWT token and populates these fields.

Response: 201 Created

{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
},
"links": {
"share": "https://micdots.com/play/welcome-audio-service-1706882300456",
"audio": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3"
}
}

Error Response: 400 Bad Request (Validation Error)

{
"success": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "Validation failed",
"fields": {
"text": ["Text must be between 1 and 1000 characters"],
"voiceId": ["Voice ID is required"]
}
}
}

Error Response: 404 Not Found (Invalid Voice ID)

{
"success": false,
"error": {
"code": "VOICE_NOT_FOUND",
"message": "Voice with ID 'invalid-voice-id' not found or not published."
}
}

Error Response: 500 Internal Server Error (Processing Failed)

{
"success": false,
"error": {
"code": "PROCESSING_FAILED",
"message": "Failed to process request. Please try again."
}
}

Get Submission by ID

Universal Endpoint

This public endpoint serves both frontend and backoffice applications. No authentication required.

Endpoint: GET /api/v1/text-to-speech/{{id}}

Authentication: Not required (public endpoint)

Path Parameters:

  • id (string, required) - Submission UUID

Purpose: Universal endpoint that returns full submission details for both public playback and administrative views.

Use Cases:

  • Frontend Public: /play/[id] route for audio playback by ID
  • Frontend Client: "My Requests" details page (authenticated users viewing their requests)
  • Frontend Backoffice: Admin request details page (admins viewing any request)
  • Returns: Complete submission data including user info, audio URL, status, and metadata

Why Public?

  • UUIDs are unguessable (e.g., 550e8400-e29b-41d4-a716-446655440000)
  • Single endpoint serves multiple frontend use cases
  • Simplifies frontend architecture (no need for separate authenticated endpoint)

Request Example:

GET /api/v1/text-to-speech/550e8400-e29b-41d4-a716-446655440000

Response: 200 OK

{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
}

Error Response: 404 Not Found (Submission not found)

{
"success": false,
"error": {
"code": "SUBMISSION_NOT_FOUND",
"message": "Submission with ID '550e8400-e29b-41d4-a716-446655440000' not found."
}
}

Testing with cURL

# Get submission by ID (no authentication required)
curl -X GET "http://localhost:5000/api/v1/text-to-speech/550e8400-e29b-41d4-a716-446655440000"

# Get submission by slug (minimal data)
curl -X GET "http://localhost:5000/api/v1/play/happy-birthday-john"

Expected Response (GET by ID - full details):

{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
}

Expected Response (GET by slug - minimal data):

{
"success": true,
"data": {
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"text": "Welcome to our audio service.",
"createdAt": "2024-01-22T14:30:00Z"
}
}

File Upload Strategy: S3 Pre-Signed URLs

Audio files can be uploaded directly to S3 using pre-signed URLs for better performance and scalability.

Why Pre-Signed URLs?

Benefits:

  • Direct to S3: Files upload directly to S3, bypassing backend
  • Faster uploads: No backend bottleneck
  • Scalable: Backend doesn't handle large file streams
  • Secure: Pre-signed URLs expire after 15 minutes
  • Progress tracking: Frontend can show upload progress

Upload Flow:

  1. Request pre-signed URL from backend
  2. Upload file directly to S3 using the pre-signed URL
  3. Update submission with the S3 URL

Get Pre-Signed Upload URL

Endpoint: POST /api/v1/text-to-speech/upload-url

Description: Generates a pre-signed URL for uploading audio files directly to S3.

Headers:

Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

Request Body:

{
"fileName": "custom-audio.mp3",
"fileType": "audio/mpeg",
"fileSize": 3145728
}

Request Body Fields:

  • fileName (string, required) - Original filename with extension
  • fileType (string, required) - MIME type (must be audio/mpeg for MP3)
  • fileSize (number, required) - File size in bytes (max 10MB = 10485760 bytes)

Response: 200 OK

{
"success": true,
"data": {
"uploadUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...",
"fileUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
"fileKey": "submissions/550e8400-custom.mp3",
"expiresIn": 900
},
"message": "Pre-signed URL generated successfully. Upload expires in 15 minutes."
}

Note: The file path includes the /submissions folder prefix for organizing TTS request audio files separately from voice gallery samples.

Response Fields:

  • uploadUrl - Pre-signed URL for uploading (use this with PUT request)
  • fileUrl - Final S3 URL after upload completes (use this when updating submission)
  • fileKey - S3 object key
  • expiresIn - Seconds until URL expires (900 = 15 minutes)

Error Response: 400 Bad Request (Invalid file type)

{
"success": false,
"error": {
"code": "INVALID_FILE_TYPE",
"message": "Only MP3 files are allowed. Received: audio/wav"
}
}

Error Response: 400 Bad Request (File too large)

{
"success": false,
"error": {
"code": "FILE_TOO_LARGE",
"message": "File size 12582912 bytes exceeds maximum of 10485760 bytes (10MB)"
}
}

Upload File to S3 (Client-Side)

After receiving the pre-signed URL, upload the file directly to S3:

Request: PUT {{uploadUrl}}

Headers:

Content-Type: audio/mpeg

Body: Binary audio file data

Testing with cURL:

# Step 1: Get pre-signed URL
curl -X POST "http://localhost:5000/api/v1/text-to-speech/upload-url" \
-H "Authorization: Bearer {{access-token}}" \
-H "Content-Type: application/json" \
-d '{{
"fileName": "custom-audio.mp3",
"fileType": "audio/mpeg",
"fileSize": 3145728
}}'

# Step 2: Upload file to S3 (use uploadUrl from previous response)
curl -X PUT "{{upload-url-from-previous-response}}" \
-H "Content-Type: audio/mpeg" \
--data-binary "@/path/to/custom-audio.mp3"

# Step 3: Update submission with the fileUrl
curl -X PUT "http://localhost:5000/api/v1/text-to-speech/{{id}}" \
-H "Authorization: Bearer {{access-token}}" \
-H "Content-Type: application/json" \
-d '{{
"audioUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
"status": "completed"
}}'

Update Submission

Endpoint: PUT /api/v1/text-to-speech/{{id}}

Path Parameters:

  • id (string, required) - Submission UUID

Headers:

Authorization: Bearer {{access-token}}
Content-Type: application/json

Request Body (JSON):

{
"voiceId": "bella-voice-id-789",
"audioUrl": "https://micdots-audio.s3.amazonaws.com/submissions/550e8400-custom.mp3",
"status": "completed"
}
Text Field is Read-Only

The text field cannot be modified after submission. Once a request is created, the text content is read-only. This ensures data integrity and prevents confusion during processing.

Request Body Fields (all optional):

  • voiceId (string, optional) - Selected voice model ID
  • audioUrl (string, optional) - S3 URL of uploaded audio file
  • status (string, optional) - Submission status: "pending", "processing", "completed"

Processing Flow:

  1. Validate request fields
  2. If voiceId provided, verify it exists in Voice Gallery
  3. If audioUrl provided, update audio file URL (see File Upload Strategy for uploading files)
  4. If status changes to completed: Automatically generate unique slug from text content with timestamp/UUID suffix
  5. Automatically update updatedAt timestamp
  6. Update submission record
  7. Return updated submission entity (includes generated slug if status is completed)

Note: Only the submission owner or admin users can update submissions.

Upload Audio Files

To upload audio files, use the S3 Pre-Signed URL approach. This allows you to upload files directly to S3, then update the submission with the audioUrl.

Response: 200 OK

{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Updated welcome message for our audio service.",
"voiceId": "bella-voice-id-789",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 48,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
}

Error Response: 404 Not Found

{
"success": false,
"error": {
"code": "SUBMISSION_NOT_FOUND",
"message": "Submission with ID '550e8400-e29b-41d4-a716-446655440000' not found."
}
}

Error Response: 403 Forbidden

{
"success": false,
"error": {
"code": "FORBIDDEN",
"message": "You do not have permission to update this submission."
}
}

Error Response: 400 Bad Request (Invalid Voice ID)

{
"success": false,
"error": {
"code": "VOICE_NOT_FOUND",
"message": "Voice with ID 'invalid-voice-id' not found or not published."
}
}

Get Submission by Slug (Public)

Endpoint: GET /api/v1/play/{{slug}}

Path Parameters:

  • slug (string, required) - Unique URL-safe slug

Authentication: Not required (public endpoint)

Purpose: Optimized endpoint for public playback pages that only need minimal data for audio playback.

Use Cases:

  • Frontend: /play/[slug] route for shareable audio links
  • Returns limited information for privacy (no user data, no internal IDs)
ID vs Slug
  • Use GET /api/v1/text-to-speech/{id} for full submission details (frontend dashboards, backoffice)
  • Use GET /api/v1/play/{slug} for minimal playback data (shareable public links)

Both endpoints are public and serve the same audio, but return different levels of detail.

Request Example:

GET /api/v1/play/happy-birthday-john

Response: 200 OK

{
"success": true,
"data": {
"slug": "happy-birthday-john",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/happy-birthday-john.mp3",
"text": "Happy birthday John! Wishing you all the best on your special day.",
"createdAt": "2024-01-22T14:30:00Z"
}
}

Note: This endpoint returns limited information (no user data, no internal IDs) for public playback, while GET /api/v1/text-to-speech/{id} returns complete submission details.


Get All Submissions (Admin)

Access Control

This endpoint is available to both clients and admins, but returns different results:

  • Admins: See all submissions from all users
  • Clients: Automatically filtered to show only their own submissions

Endpoint: GET /api/v1/text-to-speech

Authentication: Required (Client or Admin role)

Headers:

Authorization: Bearer {{admin-access-token}}
Content-Type: application/json

Query Parameters:

ParameterTypeRequiredDefaultDescription
pageintegerNo1Page number for pagination
pageSizeintegerNo20Number of items per page (max 100)
statusstringNoallFilter by status: pending, processing, completed, all
clientIdstringNo-Filter by specific client ID
sortBystringNocreatedAtSort field: createdAt, characterCount, processingTime
sortOrderstringNodescSort order: asc or desc
searchstringNo-Search in text content or slug

Request Example:

# Get all submissions (first page)
curl -X GET "http://localhost:5000/api/v1/text-to-speech" \
-H "Authorization: Bearer {{admin-access-token}}"

# Get pending submissions only
curl -X GET "http://localhost:5000/api/v1/text-to-speech?status=pending" \
-H "Authorization: Bearer {{admin-access-token}}"

# Get page 2 with 50 items per page, sorted by character count
curl -X GET "http://localhost:5000/api/v1/text-to-speech?page=2&pageSize=50&sortBy=characterCount&sortOrder=desc" \
-H "Authorization: Bearer {{admin-access-token}}"

# Search submissions containing "welcome"
curl -X GET "http://localhost:5000/api/v1/text-to-speech?search=welcome" \
-H "Authorization: Bearer {{admin-access-token}}"

# Filter by client ID
curl -X GET "http://localhost:5000/api/v1/text-to-speech?clientId=client-abc-123" \
-H "Authorization: Bearer {{admin-access-token}}"

Response: 200 OK

{
"success": true,
"data": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our audio service. Listen to this message.",
"voiceId": "rachel-voice-id-123",
"clientId": "client-abc-123",
"slug": "welcome-audio-service-1706882300456",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/welcome-audio-service-1706882300456.mp3",
"status": "completed",
"characterCount": 54,
"processingTime": 5420,
"createdAt": "2024-01-22T14:30:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
},
{
"id": "660e8400-e29b-41d4-a716-446655440001",
"text": "Happy birthday Sarah! Wishing you a wonderful year ahead.",
"voiceId": "adam-voice-id-456",
"clientId": "client-def-456",
"slug": "happy-birthday-sarah-1706882400789",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/happy-birthday-sarah-1706882400789.mp3",
"status": "completed",
"characterCount": 58,
"processingTime": 4200,
"createdAt": "2024-01-22T14:25:00Z",
"createdBy": {
"userId": "user-id-789",
"userName": "Jane Smith"
}
},
{
"id": "770e8400-e29b-41d4-a716-446655440002",
"text": "This is a test message for the audio service.",
"voiceId": "bella-voice-id-789",
"clientId": "client-abc-123",
"slug": "test-message-audio-1706882500123",
"audioUrl": "https://s3.amazonaws.com/micdots-audio/submissions/test-message-audio-1706882500123.mp3",
"status": "pending",
"characterCount": 46,
"processingTime": 0,
"createdAt": "2024-01-22T14:20:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
],
"pagination": {
"currentPage": 1,
"pageSize": 20,
"totalItems": 3,
"totalPages": 1,
"hasNextPage": false,
"hasPreviousPage": false
},
"filters": {
"status": "all",
"sortBy": "createdAt",
"sortOrder": "desc"
}
}

Response Fields:

  • data (array) - Array of text-to-speech submission entities
  • pagination (object) - Pagination metadata
    • currentPage (integer) - Current page number
    • pageSize (integer) - Items per page
    • totalItems (integer) - Total number of submissions matching filters
    • totalPages (integer) - Total number of pages
    • hasNextPage (boolean) - Whether there's a next page
    • hasPreviousPage (boolean) - Whether there's a previous page
  • filters (object) - Applied filters and sorting

Access Control Behavior:

  • Admin users: Return all submissions from all clients (no filtering)
  • Client users: Automatically filter by clientId to show only their own submissions
  • Authentication is required for both roles

Error Response: 400 Bad Request (Invalid parameters)

{
"success": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid query parameters",
"fields": {
"pageSize": ["Page size must be between 1 and 100"],
"status": ["Status must be one of: pending, processing, completed, all"]
}
}
}

Notes:

  • Submissions are sorted by createdAt (newest first) by default
  • Admins can view submissions from all clients
  • Clients automatically see only their own submissions (filtered by clientId from JWT token)
  • Maximum page size is 100 to prevent performance issues

Validation Rules

Text

  • Required: Yes
  • Type: String
  • Min Length: 1 character
  • Max Length: 1000 characters
  • Pattern: All printable characters allowed
  • Error Messages:
    • Empty: "Text is required"
    • Too short: "Text must be at least 1 character"
    • Too long: "Text must not exceed 1000 characters"

Voice ID

  • Required: Yes
  • Type: String (UUID format)
  • Validation: Must exist in Voice Gallery and be published
  • Error Messages:
    • Missing: "Voice ID is required"
    • Invalid: "Voice with ID 'xxx' not found or not published"

Client ID

  • Auto-Populated: Yes (from JWT token, NOT sent in request)
  • Type: String
  • Source: Extracted from authenticated user's JWT token

Slug

  • Auto-Generated: Yes (server-side, generated when status changes to completed)
  • Type: String | Null
  • Initial Value: NULL (when submission is created)
  • Generation: Automatically created when status is updated to completed
    • Generated from text content (URL-safe) with timestamp or UUID suffix (e.g., welcome-audio-123456789)
    • Once generated, the slug never changes - it remains stable for the lifetime of the submission
    • Used for shareable playback links (e.g., https://micdots.com/play/{{slug}})
  • Uniqueness: Must be unique across all submissions
  • Pattern: Lowercase letters, numbers, and hyphens only
  • Max Length: 200 characters

Important: The slug is automatically generated when the submission status changes to completed. This ensures every completed submission has a permanent, shareable playback link.


Dependencies

AWS S3 Storage

Audio files are stored in AWS S3.

S3 Bucket Configuration:

  • Audio Bucket: micdots-audio (single bucket for all audio files)
  • Region: us-east-1
  • Access: Public read for audio files
  • Audio Format: MP3
  • Folder Structure:
    • /voice-samples - Voice gallery audio samples
    • /submissions - User TTS request audio files
  • File Naming: {{folder}}/{{slug}}.mp3
    • The slug already contains uniqueness (timestamp/UUID suffix)
    • Example: submissions/welcome-audio-123456789.mp3
    • Example: voice-samples/rachel-sample-001.mp3
Single Bucket Architecture

The same S3 bucket is used for both voice gallery samples and user submissions, organized in separate folders for better management and access control.


Integration Flow

Epic 1 Flow (Manual Processing)

Epic 1 Characteristics:

  • Manual audio processing by admin
  • Initial status: pending (slug is NULL)
  • Admin obtains audio from any external source
  • Admin uploads audio file to S3 using pre-signed URLs
  • Admin updates submission with audioUrl and status completed
  • Slug is auto-generated when status changes to completed
  • Single S3 bucket (micdots-audio)
Future Automation

In future releases, the flow will be automated with TTS API integration for immediate audio generation.


Security Considerations

Authorization

  • Create submissions (POST): Requires authentication
  • Update submissions (PUT): Requires authentication (only owner or admin)
  • Get submission by ID (GET /api/v1/text-to-speech/{id}): Public (no auth) - UUIDs provide security through obscurity
  • Get submission by slug (GET /api/v1/play/{slug}): Public (no auth) - designed for sharing
  • List submissions (GET /api/v1/text-to-speech): Requires authentication, role-based filtering:
    • Clients: Only see their own submissions (filtered by clientId)
    • Admins: See all submissions from all clients

Input Validation

  • Sanitize all text inputs to prevent XSS attacks
  • Validate voice ID exists and is published
  • Limit request rate to prevent abuse (10 requests per minute per user)

Rate Limiting

  • Create (POST): 10 requests per minute per authenticated user
  • Update (PUT): 20 requests per minute per authenticated user
  • Get by ID (GET /api/v1/text-to-speech/{id}): 1000 requests per minute (global, public endpoint)
  • Get by Slug (GET /api/v1/play/{slug}): 1000 requests per minute (global, public endpoint)
  • List submissions (GET /api/v1/text-to-speech): 100 requests per minute per authenticated user

Data Privacy

  • GET /api/v1/text-to-speech/{id}: Returns full submission details including user info (createdBy)
    • UUIDs provide security through obscurity (unguessable)
    • Consider user consent for data sharing
  • GET /api/v1/play/{slug}: Minimal data endpoint (no user info, no internal IDs)
    • Only returns slug, audio URL, text, and timestamp
    • Designed for public sharing without privacy concerns


Future Enhancements (Not in MVP)

Future Features

The following features are planned for future releases but are NOT included in Epic 1 MVP:

  • Text translation before speech generation
  • Multiple language support
  • Custom voice cloning
  • Batch text-to-speech processing
  • Audio editing and effects
  • Voice speed and pitch controls
  • SSML (Speech Synthesis Markup Language) support
  • Analytics tracking for audio playback