Instagram adds text-to-speech to Reels

Rachel Handley 12.Nov.2021

Instagram has added a text-to-speech feature to Reels, allowing creators to convert video captions into audio.

This is seemingly in an effort to keep up with TikTok, which launched text-to-speech in December last year.

While popular (there are currently 1.5 billion views on the #texttospeech hashtag alone), TikTok’s AI voices have received a lot of negative attention. And Instagram could go down the same route.

For starters, it’s only offering two voices, both of which have an American accent: Voice 1 (a female voice) and Voice 2 (a male voice). It also experiences similar issues with inaccurate pronunciations and robotic-sounding speech.

These limitations are understandable in the context of these platforms. But audiences are increasingly exposed to advanced synthetic speech, so their expectations are high.

BeyondWords, for example, offers a library of over 720 voices across 64 languages, and the ability to create a custom voice. We also use natural language processing (NLP) and speech synthesis markup language (SSML) to ensure more accurate text-to-speech. And we’ve powered over one billion listens for over 120 global publishers.

So, I expect Instagram’s text-to-speech feature to get its fair share of criticism. But it’s an impressive feature that will no doubt assist creators and their audiences.

How to use the text-to-speech feature on Instagram

Open Instagram and go to the Reels camera
Create your video then select ‘Preview’
Tap ‘Aa’ to add a text caption
Tap the text bubble, then ‘...’, then ‘Text-to-Speech’
Select a voice then tap ‘Done’
Make any other edits then share your Reel

With BeyondWords, you and your team can convert any text into quality audio. We also offer all the distribution, analytics, and monetization tools you need. Create your free account now.

text-to-speech

How to use the text-to-speech feature on Instagram

You might also like

Changelog: New Player, API, and Expanded Voice Cloning

The Importance of Text Preprocessing in TTS