1. Overview
  2. Using The Platform
  3. Step #2: Video Personalization
  4. Best practices to generating better sounding TTS audios

Best practices to generating better sounding TTS audios

Does your audio sounds kind of robotic or weird?

Not all text-to-speech engines are created equal. The platform natively uses neural voices from Amazon, Google, Azure, and IBM. These TTS engines use different advanced algorithms and natural-sounding voices to generate more realistic speech.


TIP #1 - The platform allows you to to import more voices from your external providers like ElevenLabs, Play.ht, Fliki, and api.audio (and soon with Murf.ai and Lovo) for even more flexibility and realistic sounding outcomes! Simply go to Integrations and follow the instructions.

TIP #2 - You can import and use your REAL voice by uploading an audio file and lip-sync it with the avatars - Learn more
Here are few more tips and tricks to help you generate better sounding text-to-speech audio:

1. Use the right voice for your content:


Different voices are suited for different types of content.

2. Adjust the speech rate and use a soundtrack:


Our editor allow you to adjust the speech rate and add soundtrack as a background to sound more engaging.

Experiment with different settings to find the right balance for your content.

3. Use punctuation (!) and commas (,):


Adding punctuation and commas to your text can help the text-to-speech engine to understand the structure of the sentence and generate more natural-sounding speech.

For example "Hi John ! In addition to your previous message , I forgot to say that..."

4. Use <break> tags:


SSML allows you to control the duration of pauses between words and sentences. Use break tags to add natural-sounding pauses to your speech.

For example, you can add <break time="3s"/> to add 3 second pause throughout your script.

5. Getting the correct pronunciation



Sometimes a TTS model will not pronounce words correctly or in the way you like. You can experiment with misspellings of words until you get something that sounds better.

For example “peekahchuu” might sound different than “pikachu”.

6. Use SSML markups such as <emphasis> , <say-as> and <prosody>



Add more life to your speech and make it sound more realistic using advanced markups.

For more details study the following articles from Amazon SSML Docs and/or Google SSML Docs

Test and iterate: Test the generated speech with different listeners and in different scenarios to get feedback and improve the final result.

Remember, text-to-speech technology is constantly evolving, so be sure to keep an eye out for new developments and updates.


Was this article helpful?
© 2024 VideoMail Support