How we view the text-to-speech audio market
Please note that this post was published before SpeechKit rebranded to BeyondWords.
The speech services market is young but already being disrupted. How do you compare company X or company Y in such a fast moving environment?
In this post, we lay out how we see this rapidly evolving market and what differentiates one company from another. First, some background...
Text-to-speech has often been overlooked as a revolutionary new media format. Over recent years, as with much nascent AI, text-to-speech has been scrutinised over error rates, poor voice quality, and its ‘robotic-ness’. After 4-years of working to improve this technology, listening through hours upon hours of audio data, our team is well versed in these imperfections.
An early test trained on voice data from Desert Island Discs.
A recent test trained on just 10-minutes of public data.
However, great improvements have been made - by our team and by our peers. Last year, Nic Newman at the Reuters Institute predicted audio articles would become ‘standard’. New neural voices from SpeechKit, Amazon, and Google, among others, are coming closer to human speech than could have ever been imagined. Furthermore, text-to-speech is starting to show very real results. The average engagement time of a listener is more than four times that of a non-listener.
Text-to-speech is being adopted to improve customer value and fast-track digital transformation. Below we’ve segmented the market into three categories to make for easier comparison.
Cloud Service APIs
Services, such as Amazon Polly, Google WaveNet and IBM Watson, provide text-to-speech APIs as part of their cloud service platforms. These products have been developed over the last 10-15 years, producing some of the most robust voices on the market. Early adopters, such as Bloomberg and The Globe and Mail, adopted Amazon Polly across their publishing to give their readers the option to listen. Whilst high-quality, these services lack utility and customization; they’ve produced very powerful APIs, yet haven’t created any tools (hosting, CMS integrations, analytics, monetization) to help publishers get the most from text-to-speech.
Strengths: reliability, voice-quality, cost.
Weaknesses: ease-of-integration, customization, publishing tools.
Custom Voice Startups
Speech startups, such as Resemble and Sonantic, aim to out-compete those cloud services mentioned above, by creating custom synthetic voices. Similarly, these companies have deep expertise in Machine Learning, developing high-quality voices for sectors such as gaming and customer service. Similarly to cloud services, whilst these companies have developed advanced APIs, they’ve neglected publishing tools. Neither of these examples provides audio distribution tools, analytics or monetization options for publishing at scale.
Strengths: voice-quality, customization, ease-of-integration.
Weaknesses: publishing tools, cost, scalability.
Text-To-Speech Startups
Startups, such as Trinity Audio and Play.ht, are our most direct competitors. These, like ourselves, are developing tools around text-to-speech, creating value for publishers and bloggers. Trinity Audio have focused on monetizing APIs created by those cloud services mentioned above. They’ve built ad insertion technology and partnered with multiple audio DSPs to improve fill-rates and potential CPMs. Their business model aims to provide revenue back to publishers who adopt their tech, while they take a cut. Whilst this is sound in principle, we believe a focus on voice quality, infrastructure and customer service most benefit our customer’s long term.
Strengths: monetization, cost, scalability.
Weaknesses: ease-of-integration, voice-quality, customization.
SpeechKit
In comparison, SpeechKit has focused on becoming the ‘full-stack’ service for automated audio publishing. We help news publishers, institutions, corporate businesses and bloggers publish their written content in audio, automatically and at scale. We believe that improving access to quality content, through seamless audio integrations, is the real power of text-to-speech. Our strengths lie in building scalable products, using domain-specific voices, that improve the effectiveness of digital publishing. We’ve battle-tested our products across 150+ news sites and built out tools including analytics, APIs, SDKs and an audio CMS (take a look at our docs). With less than 1% of the world’s content currently available in audio we have high hopes for the coming years!
Check it out for yourself, schedule a demo with a team member today.