Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products.
Polly’s Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications that work in many different countries.
This plugin provides AWS Polly - Text to Speech services in two request modes:
Synthesize Speech (Sync): Synchronous request mode, useful for small file and time-sensitive application.
Synthesize Speech Task (Async): Asynchronous request mode, useful for large file and time-insensitive application, requiring an AWS S3 Bucket.
In synchronous mode, the limit for input text or SSML is a maximum of 6000 characters total, of which no more than 3000 can be billed characters (e.g. excluding punctuation, spaces, SSML tags and such). The output audio stream (synthesis) is limited to 10 minutes. After this is reached, any remaining speech is cut off.
In asynchronous mode, the limit for the input text can be up to 100,000 billed characters (200,000 total characters). SSML tags are not counted as billed characters.
To interact with AWS S3, it is highly recommended to use this plugin in conjunction of our AWS S3 & SQS Utilities plugin to provide the Put, Get, and Delete a file from AWS S3.
The plugin returns a list of available voices, and the audio datastream in synchronous mode, and additionally AWS TaskId and Status in asynchronous mode.
I want to build an app capable of reading out loud entire books, while highlighting the sentence that is currently being read (just like in a karaoke). AWS Polly uses Speech Marks for this.
Does your plugin support speech marks api requests? If not, is there any other way to achieve what I want using your plugin or any other plugin that you know?
Just to let you know that this plugin now supports speechmarks!
As Jeff would put himself
Speech marks are metadata that describe the speech that you synthesize, such as where a sentence or word starts and ends in the audio stream. When you request speech marks for your text, Amazon Polly returns this metadata instead of synthesized speech. By using speech marks in conjunction with the synthesized speech audio stream, you can provide your applications with an enhanced visual experience.
For example, combining the metadata with the audio stream from your text can enable you to synchronize speech with facial animation (lip-syncing) or to highlight written words as they’re spoken.
Perhaps @phrase9 will show a particular interest on this