Please note that this post was published before SpeechKit rebranded to BeyondWords.
As consumer demand for audio increases, more publishers are turning to text-to-speech. AI audio publishing platforms like SpeechKit offer a quick and cost-effective way to make written content listenable.
But how are visitors engaging with these AI audio–enabled articles?
To find out, we’ve compared listener and non-listener engagement across more than 28 million sessions in which the SpeechKit Player was loaded last year.
AI audio captures the attention of new users
The average new user spends just 2 seconds on-site when they don’t engage with AI audio. This increases to 225 seconds (+11,150%) when they do press play.
In an increasingly competitive online world, publishers need to do more to stand out and convince new readers they’re worth their time. Offering your written content in an audio format is an effective way to capture their attention.
AI audio keeps visitors coming back for more
Listeners are 32% more likely to engage in multiple sessions than non-listeners, suggesting that audio keeps users coming back for more. With a Northwestern University study showing that frequency of consumption is the biggest predictor of subscriber retention in digital news, AI audio can play a key role in sales and customer lifetime value.
“The number of cases for how audio fits in with media companies’ subscription businesses is growing.” — Lucinda Southern, Media Editor at Adweek
Plus, returning users are 38% more likely to press play than new users. This means audio articles are more popular with habitual visitors — visitors who drive the most revenue in both advertising- and subscription-based revenue models.
When returning visitors do opt to listen, they stay on-site for longer (+688%) and visit more pages along the way (+11.48%).
Listeners visit more pages
The average non-listener visits 1.17 pages per session, whereas the average listener visits 1.39 pages (+19%) before leaving a site, according to our analysis.
This finding suggests that listening experiences encourage visitors to stick around and explore. By providing content in a high-quality audio format, which some may find more accessible or engaging than text, publishers give themselves a better chance of keeping visitors on-site, potentially driving more ad revenue.
Listeners spend longer on-site
We found that non-listening sessions typically last just 30 seconds, whereas sessions involving audio last for 322 seconds (+973%). This means that users stay on-site over 10x longer when they press play.
As well as boosting brand loyalty and revenue, this increased engagement could benefit your search engine optimization (SEO) performance.
According to Backlinko: “Google pays very close attention to ‘dwell time’: how long people spend on your page when coming from a Google search. [...] The longer time spent, the better.”
Making your written content listenable could therefore improve your search engine rankings, meaning that more people discover your content organically.
Audio engages at every age
Our analysis backs the established idea that younger demographics gravitate towards audio content. 18-to-34-year-olds represented 28.02% of our sample but 37.92% of listening sessions, making this the age group most likely to press play (1.5x more likely than 35 and overs). Audio engagement also had the biggest impact on session duration here (+1,109%).
However, it was listeners aged 55+ who remained on-site for longest, clocking an average session duration of 407 seconds. This group also visited the most pages per session (1.765), which represented the biggest increase versus non-listeners (+48%). This suggests that engaging older people with audio can result in the biggest pay-off.
Audio interaction also had a positive impact on the engagement of 35-to-54s, corresponding with a 2% increase in pages per session and 847% increase in session duration.