I see what you mean. I don’t think there is. All Hume return is an array of the emotions each with a name and a score. I just sort the list by the score before populating the state. They don’t seem to differentiate between the negative/positive emotions.
You may have to create yourself some system in Bubble for that, or I could attempt to do it through the code I guess. It almost feels like you’d need to define a positive or negative value to each emotion. So we already have Name and Score fields defined through those states, if there was a third field called “???” (can’t think of a name for it right now) then it could contain either a positive or negative text value? I guess each emotion would need to be assigned the correct value, not sure!
This update includes a number of changes. The whisper model is now included and can be used as an alternative to Deepgram when it comes to transcribing audio into text. Previously, if you wanted to speak to the BOT via the microphone, you had to use Deepgram which opened up a websocket to stream the audio from the microphone and await the transcription through the response.
If you use whisper instead, the voice activity detector (Silero VAD) is used which does a much better job at handling speech interruptions and the audio is then translated through this model before being passed onto the assistant. It works better, it’s quicker and it’s free, but it lacks configuration (there is none right now). This is the same model that OpenAI currently use for their speech to text service.
The configuration options through the plugin element have all changed to accommodate this and the instructions page has been updated with the descriptions as to what each of the options does along with screenshot.
The text to speech options have also been updated, you now have three options to use. Previously, Deepgram and ElevenLabs were available so I’ve added the OpenAI voices too which I believe are cheaper to use.
Vision capabilities - yes the AI BOT can now see everything!
Will also measure facial expressions when Hume AI is enabled.
Update v1.30.0
This update includes Vision! which is really exciting! (well at least for me anyway). There’s a small section at the bottom of the plugin element settings that look like this…
When this is enabled, the plugin will attempt to start the camera device. Whenever a question is asked that would generally require some sort of sight/vision, then it takes a snapshot of the current video frame and sends that along with the user’s question to a model hosted by Groq that’s capable of analysing the image and returning an answer to the question.
From my testing, this seems to be working really well and I’ve tested on various desktop/mobile devices. I’ve yet to add a way to change the camera device, but that will get added over the next couple of days.
The demo page has this enabled so you can try it yourself by asking what it can see surrounding you.
There is an option in the settings where you can have the camera stream popup into a draggable element if needed. It’s good for testing, maybe nice for other functionality. If you require any changes to this (CSS for example), feel free to let me know.
Hi Paul,
I have this type of error message even though the Croq API key is properly filled in on the token page, and the token is correctly entered on the plugin page. What did I miss?
That’s it thank you.
I see the problem, it’s the https part in the referrer field in the tokenAuth page!
I’ve just fixed it for you, if you refresh it should hopefully work now (let me know).
I’ve just updated the tokenAuth page. It should now work with the protocol and without it.
@pork1977gm
I’m experiencing a recurring issue with Text-to-Speech output in my chatbot, using both ElevenLabs and Deepgram. The volume within a single phrase varies significantly, with some parts of a sentence becoming almost inaudible. This inconsistency is quite noticeable and impacts the overall experience.
Have you encountered this issue before with either service? Do you have any suggestions or insights on how I could resolve it?
When it occurs, is there any other background noise going on? The default options are relatively sensitive, but it might not be to do with this.
To be sure, set the “VAD lower volume” value to 1, which will disable it. Note: If you’re using the whisper model for transcribing the speech then you can’t disable it.
Let me know how you get on. If you have a URL where I can try it, fire that over to me too and I’ll test it my end.
Ah ok, yeh I need to work on that, echo cancellation and noise cancellation options for the audio stream which is opened up in the browser, aren’t particularly great as I’ve had the same problem myself. I’ll do a bit of research into this area and try and make some improvements around it.
I’d love to see some finished apps or prototypes with this AI plugin, there’s a lot you can do with it. Anyone with some links to some finished products, put them up!
I’m struggling to replicate this. Can I ask, where in your workflows are you running the ‘Start microphone’ action? I’m wondering if it’s running just a little too early.
As a test, do you think you could throw in a pause action just before-hand? then let me know what happens.
Hi @pork1977gm Hope you’re well these days. I’m encountering an error when trying to load a saved conversation.
Likewise, when I update past version 1.35.0, I start to get a console error that seems to be related to the “total_tokens” state. Below you can see both error messages. Any ideas? Thanks!