💬 ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ Google Cloud - Speech to Text (incl. Speech Recorder + Automated Google Environment Setup!)

Hi Bubblers !

This plugin turns speech into text, allowing you to create applications that transcribe, and build entirely new categories of speech-enabled products.

Accurately convert voice to text in over 125 languages and variants by applying Google’s powerful machine learning models with this plugin.

The plugin provides :

  • a first Workflow Action to trigger the analysis.
  • a second Workflow Action to return the analysis progress rate, completion status, and when completed, returns a list of transcriptions. For each, it returns a list of words, with related timestamps, confidence rate, and the audio channels (if applicable).

This plugin supports automatic punctuation and profanity filter.

You can test out our Google Cloud - Speech to Text Plugin with the live demo .

image

Enjoy !
Made with :black_heart: by wise:able
Discover our other Artificial Intelligence-based Plugins

5 Likes

Bump for edit.

Would you guys be willing to make a custom version of this plugin for us with some extra features such as audio longer than 1 minute?

2 Likes

Sure we can ! Just DM’ed you on this.

1 Like

Thanks. I just responded.

To ease the use of Google Speech, we have released an update of this plugin to support long audio speech (lifting the 1 minute cap!).

Also, now we support automatic audio file detection through our analyzer that will automatically parameterise Google Speech API. Enjoy the magic !

Exclusive for you bubblers :heart: !

1 Like

Thats great thanks guys. I have been a bit busy but this project is still going ahead. Do you also have a once off purchase option or only the monthly?

All our plugins are subscription-based, allowing you to try out for a few cents, unsubscribe at any time, or keep it if you like it :slight_smile:

1 Like

Ok cool no problem. I will subscribe to this one this evening and start testing it out

1 Like

Can you put the link to the plugin?

Hey @anon26152888,

The links are the same :slight_smile:.

Hey guys

So I set this plugin up. Just using the synchronous for now and the results are concerning. Out of 7 short audio files, only 2 showed results. The two that showed results were not very accurate. The other 5 returned no results even though their sound quality is not all that bad. I thought maybe its a mistake in my setup that is causing this but then I uploaded the same files to your demo site and I got the same results.

How do I go forward from here? Accuracy is important but not showing results is of course an even bigger concern.

Let me know if you want me to email you the audio files I used to do your own tests. All are mp3 format, as I see the plugin doesn’t accept wav. files. All files are English US.

Hey @phrase9,

Thanks for reach out ! Here are our comments on the issues you raised.
Please keep in mind that the plugin performance is tied to Google Speech-to-Text 's one.

WAV files Support

Our plugin, as does Google Speech-to-Text supports .wav files, however .wav is a format, not an encoding per se.

From official Google Speech-to-Text documentation:

Audio formats vs encodings

Note that an audio format is not equivalent to an audio encoding. A popular file format like .WAV for example, defines the format of the header of an audio file, but is not itself an audio encoding. .WAV audio files often, but not always, use a linear PCM encoding; don’t assume a .WAV file has any particular encoding until you inspect its header.
Speech-to-Text supports WAV files with LINEAR16 or MULAW encoded audio.

MP3 files support

Bear in mind that MP3 is still in Beta for Google Speech, as indicated in the Google Speech-to-Text documentation.

MP3 encoding is a Beta feature and only available in v1p1beta1

Should you want to transcribe MP3, we would be happy to customise our plugin to support this Beta feature. Please reach us via DM directly if you would like this specific Beta support.

Regarding possible mistakes in audio configuration of the plugin

To make sure your setup (and more importantly the detection configuration sent to Google), we recommend to detect the encoding and automatically configure our plugin, using the following plugin we posted before, as demonstrated in our demo:

If the performance of Google Speech-to-Text platform is not sufficient for your use-case, one solution would be to test with another provider, such as AWS Transcribe.

Thanks for the quick reply.

I have noted what you said that it accepts wav. and that mp3 is in beta but it is strange that every single wav. file I uploaded ( seven to ten files) all were rejected whereas 2x mp3 files as I said were accepted and gave results whereas the other five were not rejected with a message, but simply gave no results.

So this means I have been unlucky in that every single one of the wav. files had the wrong encoding? How will your audio analyzer resolve this?

More specifically:

Speech-to-Text supports WAV files with LINEAR16 or MULAW encoded audio.

The Audio Analyzer parses the audio file, detect its encoding, sampling rate, channels and pass the relevant configuration to our plugin, increasing the chances of accurate transcription.

Please note that it does not make the input file compatible, it makes sure that the Google Speech-to-Text transcribing configuration is aligned with the file encoding characteristics, deferring to Google Speech-to-Text engine the processing.

  1. Can you please guide me to a resource where I can find various audio file samples that are wav. with LINEAR16 or MULAW encoded audio and are also shorter than one minute as I am testing synchronous for now and want to test with more than just the one file from your demo.

  2. So I should actually tell my users that we in fact do not offer speech to text, but only wav. to text, and not all wav, but only wav. with specific encoding?

We unfortunately do not maintain test files samples repository, the best would be to either search the web for such, or encode directly the files for your test needs.

The supported encodings (8 encodings + 1 in Beta at the moment) by Google Speech-to-Text are as follow:

We are able to support MP3 if needed be:

Finally:

I do not want to change the service, I want to figure out how to get this plugin to work properly. I refuse to believe that Google speech to text, the most popular speech to text API is rejecting every single audio file I am inputting into it. How are people using this API in apps if 90% of audio files do not work. I just spent the last 20 minutes copying over 10 random mp3 and wav. urls from this website: Free audio samples, drum loops & kits, vocals, royalty free music into the app and exactly zero of them gave a result. The only time I saw this plugin work is with 2 MP3 (not wav) files which I put in, and on your demo.

We are able to support MP3 if needed be:

It already transcribed 2 mp3 files as I mentioned. So what does the customization do that does not already exist?

What I would like to know is this: What automated solutions do companies and users who use this API utilize, to make it usable for their users? The average website user knows nothing about audio encoding formats.

The instructions to set up the plugin are included in the instructions that you can find on the plugin page, you will find also the link to the editor of the demo.
The input audio files must satisfy the supported encodings:

We are providing this plugin to automatically detect and configure the required inputs for our plugin, and therefore Google Speech-to-Text, so the users would not have to know the details of the audio file.

Here is a WAV sample file used with the plugin.

The customisation consists of routing the requests to the v1p1beta1 version of Google Speech-to-Text APIs and adapt the input parameters as per the v1p1beta1 API specifications.

Ok, I will buy your audio file analyzer later this evening and continue testing thereafter. I would also like the MP3 add-on yes.