🗣 ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ AWS Polly - Text to Speech Plugin (incl. Automated AWS Environment Setup!)

Hi Bubblers !

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products.
Polly’s Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications that work in many different countries.

This plugin provides AWS Polly - Text to Speech services in two request modes:

  • Synthesize Speech (Sync): Synchronous request mode, useful for small file and time-sensitive application.
  • Synthesize Speech Task (Async): Asynchronous request mode, useful for large file and time-insensitive application, requiring an AWS S3 Bucket.

In synchronous mode, the limit for input text or SSML is a maximum of 6000 characters total, of which no more than 3000 can be billed characters (e.g. excluding punctuation, spaces, SSML tags and such). The output audio stream (synthesis) is limited to 10 minutes. After this is reached, any remaining speech is cut off.

In asynchronous mode, the limit for the input text can be up to 100,000 billed characters (200,000 total characters). SSML tags are not counted as billed characters.
To interact with AWS S3, it is highly recommended to use this plugin in conjunction of our AWS S3 & SQS Utilities plugin to provide the Put, Get, and Delete a file from AWS S3.

The plugin returns a list of available voices, and the audio datastream in synchronous mode, and additionally AWS TaskId and Status in asynchronous mode.

You can test out our AWS Polly - Text to Speech Plugin with the live demo.

Enjoy !
Made with :black_heart: by wise:able
Discover our other Artificial Intelligence-based Plugins

4 Likes

Hi @redvivi ,

I want to build an app capable of reading out loud entire books, while highlighting the sentence that is currently being read (just like in a karaoke). AWS Polly uses Speech Marks for this.

Does your plugin support speech marks api requests? If not, is there any other way to achieve what I want using your plugin or any other plugin that you know?

Not yet, but will definitely put it in the improvement roadmap :slight_smile:

1 Like

Hello Bubblers!

Just to let you know that this plugin has been updated to provide an automated script to configure your AWS environment :man_mechanic:t3:.

Enjoy!

1 Like

Hi @redvivi

Can you describe how to use the automated script to configure AWS ? Any link?

1 Like

Hey @JohnMark !

Like all our plugins, the setup instructions are describe in the instructions section of the plugin page

Feel free to reach out if you have any further inquiries!

1 Like

Hey Bubblers!

Just to let you know that this plugin now supports speechmarks!

As Jeff would put himself

Speech marks are metadata that describe the speech that you synthesize, such as where a sentence or word starts and ends in the audio stream. When you request speech marks for your text, Amazon Polly returns this metadata instead of synthesized speech. By using speech marks in conjunction with the synthesized speech audio stream, you can provide your applications with an enhanced visual experience.

For example, combining the metadata with the audio stream from your text can enable you to synchronize speech with facial animation (lip-syncing) or to highlight written words as they’re spoken.

Perhaps @phrase9 will show a particular interest on this :slight_smile:

3 Likes

Hi there :wave: The demo for this plugin doesn’t seem to be functioning in sync mode. Is the plugin still supported?

1 Like

It is and I don’t see any errors :wink: