Skip to main content

Pre-Requirements

Before starting the ElevenLabs research, the client must provide specific information and access credentials. This page details everything needed to begin the 12-hour research period.


⚠️ Client Checklist (Confirm Before Day 1)

IMPORTANT: Before the backend developer begins Day 1, confirm all of the following:

  • API key is active and tested

    • Log into ElevenLabs account
    • Generate or locate existing API key
    • Test the API key with a simple request
  • Voice IDs are valid and accessible

    • All voice IDs are copied correctly (no typos)
    • Voice IDs are accessible with the provided API key
    • Voice IDs are in correct format (e.g., 21m00Tcm4TlvDq8ikWAM)
  • Account has sufficient credits/quota for testing

    • Check account balance/credits
    • Research will generate ~18 audio samples (2 models × 3 voices × 3 samples)
    • Ensure sufficient quota to complete all testing
  • Client is available for questions during the research period

    • Developer may have questions during Day 1 or Day 2
    • Quick responses help keep research on schedule
    • Availability via email, Slack, or preferred communication channel

Required from Client

1. ElevenLabs Account Information

What we need:

  • Account email or account ID
  • Access level/subscription tier (Free, Starter, Creator, Pro, etc.)
  • Usage limits and quotas

Why we need it: Understanding the account tier and limits helps us plan testing within available quotas and identify any constraints that may affect production implementation.


2. API Credentials

What we need:

  • ElevenLabs API Key with appropriate permissions

Required API access:

  • Text-to-Speech endpoint
  • Voice listing/details endpoint
  • Voice validation capabilities

Why we need it: The API key is essential for all testing and validation work. It must have sufficient permissions to test text-to-speech generation, list available voices, and validate voice IDs.


3. Use Case Specifications

What we need:

  • Target audience (affects voice selection)
  • Any specific voice characteristics required (accent, tone, age, etc.)

Why we need it: Understanding the target audience and voice requirements helps us provide more relevant recommendations during the research phase.

Examples:

  • "Target audience: Young adults aged 18-35, prefer energetic and friendly tone"
  • "Need professional, neutral accent for business communications"
  • "Require warm, calming voice for meditation content"

Optional but Helpful

4. Voice IDs for Testing (Optional)

What we need (if client provides):

  • 3 voice IDs that the client wants to use
  • Mix of male and female voices recommended
  • Example format: 21m00Tcm4TlvDq8ikWAM (Rachel)
  • If using custom cloned voices, provide those IDs as well

If client doesn't provide voice IDs:

We'll use default voices for testing:

  • 1 Male voice - Professional, clear
  • 1 Female voice - Natural, engaging
  • 1 Male British voice - Accent variety

Why this helps:

If client provides their preferred voice IDs, we test with voices they actually want to use in production. This makes the research more relevant and actionable.

If not provided, we'll test with diverse default voices to demonstrate capabilities.

Where to find voice IDs:

Future requirement: In the future, we will need 8 desired voices for the full implementation.


5. Budget Constraints

What helps us:

  • Maximum acceptable cost per generation
  • Monthly budget allocation for voice generation

Why it helps: This information helps the client make informed decisions when choosing between models. We can better balance quality vs. cost trade-offs if we know budget constraints upfront.

Note: Cost tracking is integrated into model testing, so you'll see actual costs per sample in the deliverable.


6. Quality Expectations (Optional)

What helps us:

  • Audio quality standards (professional, good, acceptable)
  • Clarity and naturalness requirements
  • Specific quality concerns or priorities

Why it helps: Understanding your quality expectations helps us evaluate the models more effectively. If you have specific quality standards (e.g., "must sound professional for corporate use" or "clarity is more important than naturalness"), we can focus our testing and recommendations accordingly.

Examples:

  • "Audio must be professional quality suitable for customer-facing content"
  • "Prioritize clarity over naturalness - every word must be understood"
  • "Natural, conversational tone is most important for our use case"

Audio Format Requirements (Pre-defined)

These settings are already defined for MicDots platform:

  • Format: MP3
  • Bitrate: 128 kbps
  • Channels: Mono (single channel)
  • Purpose: Optimized for voice speech and slow internet connections

Why these settings:

  • MP3: Universal compatibility, good compression
  • 128 kbps: Sweet spot for voice quality vs. file size
  • Mono: Voice content doesn't need stereo, reduces file size by ~50%
  • Result: Fast downloads on slow connections, good quality for QR code use case

Note: These settings are fixed and should be used for all testing. No client input needed.


What Happens Next

Once all pre-requirements are confirmed:

  1. Day 1 (6 hours): Developer will test both models (Turbo v2.5 and Eleven Flash) with all provided voice IDs, validate voices, and gather technical metrics
  2. Day 2 (6 hours): Developer will create code samples, organize audio files, and complete the deliverable template
  3. Deliverable: Client receives comprehensive analysis with audio samples, technical metrics, and developer recommendation

Questions?

If any pre-requirements are unclear or you need help gathering this information, please contact the development team before Day 1 begins.

Return to: ElevenLabs API Research