[PLUGIN] - AI BOT + Groq/Gemini Inference

pork1977gm · October 29, 2024, 8:10pm

I see what you mean. I don’t think there is. All Hume return is an array of the emotions each with a name and a score. I just sort the list by the score before populating the state. They don’t seem to differentiate between the negative/positive emotions.

You may have to create yourself some system in Bubble for that, or I could attempt to do it through the code I guess. It almost feels like you’d need to define a positive or negative value to each emotion. So we already have Name and Score fields defined through those states, if there was a third field called “???” (can’t think of a name for it right now) then it could contain either a positive or negative text value? I guess each emotion would need to be assigned the correct value, not sure!

Timbo · October 30, 2024, 9:34am

Yeah, I thought that too. I looked at both docs for the plugin and Hume.

Not to worry. I’ll just remove the score from public view as that may raise questions I can’t answer lol

pork1977gm · November 2, 2024, 4:07pm

Whisper STT integration + OpenAI TTS functionality

Update v1.24.0

This update includes a number of changes. The whisper model is now included and can be used as an alternative to Deepgram when it comes to transcribing audio into text. Previously, if you wanted to speak to the BOT via the microphone, you had to use Deepgram which opened up a websocket to stream the audio from the microphone and await the transcription through the response.

If you use whisper instead, the voice activity detector (Silero VAD) is used which does a much better job at handling speech interruptions and the audio is then translated through this model before being passed onto the assistant. It works better, it’s quicker and it’s free, but it lacks configuration (there is none right now). This is the same model that OpenAI currently use for their speech to text service.

The configuration options through the plugin element have all changed to accommodate this and the instructions page has been updated with the descriptions as to what each of the options does along with screenshot.

The text to speech options have also been updated, you now have three options to use. Previously, Deepgram and ElevenLabs were available so I’ve added the OpenAI voices too which I believe are cheaper to use.

The events “Websocket opened” and “Websocket closed” have been renamed to “Call started” and “Call ended”.

More updates to follow soon… any problems, please let me know!

Paul

Timbo · November 2, 2024, 5:05pm

Perfect. Thanks for adding this. Saved money and time. Appreciate it Paul.

Tim

pork1977gm · November 8, 2024, 2:14pm

Vision capabilities - yes the AI BOT can now see everything!

Will also measure facial expressions when Hume AI is enabled.

Update v1.30.0

This update includes Vision! which is really exciting! (well at least for me anyway). There’s a small section at the bottom of the plugin element settings that look like this…

When this is enabled, the plugin will attempt to start the camera device. Whenever a question is asked that would generally require some sort of sight/vision, then it takes a snapshot of the current video frame and sends that along with the user’s question to a model hosted by Groq that’s capable of analysing the image and returning an answer to the question.

From my testing, this seems to be working really well and I’ve tested on various desktop/mobile devices. I’ve yet to add a way to change the camera device, but that will get added over the next couple of days.

The demo page has this enabled so you can try it yourself by asking what it can see surrounding you.

There is an option in the settings where you can have the camera stream popup into a draggable element if needed. It’s good for testing, maybe nice for other functionality. If you require any changes to this (CSS for example), feel free to let me know.

Paul

thierry · November 19, 2024, 9:17am

Hi Paul,
I have this type of error message even though the Croq API key is properly filled in on the token page, and the token is correctly entered on the plugin page. What did I miss?

pork1977gm · November 19, 2024, 11:12am

Hi @thierry

Can you open up the console for me and just screenshot the error in there?
That will hopefully give me a little more info. I’ll have a look at it now.

Paul

thierry · November 19, 2024, 11:51am

Is this what you want to see?

pork1977gm · November 19, 2024, 11:58am

That’s it thank you.
I see the problem, it’s the https part in the referrer field in the tokenAuth page!
I’ve just fixed it for you, if you refresh it should hopefully work now (let me know).

I’ve just updated the tokenAuth page. It should now work with the protocol and without it.

thierry · November 19, 2024, 12:55pm

I no longer have the error message.
Thank you @pork1977gm

thierry · November 19, 2024, 1:36pm

@pork1977gm
I’m experiencing a recurring issue with Text-to-Speech output in my chatbot, using both ElevenLabs and Deepgram. The volume within a single phrase varies significantly, with some parts of a sentence becoming almost inaudible. This inconsistency is quite noticeable and impacts the overall experience.

Have you encountered this issue before with either service? Do you have any suggestions or insights on how I could resolve it?

pork1977gm · November 19, 2024, 3:17pm

So that sounds like it could be to do with the voice activity options here…

When it occurs, is there any other background noise going on? The default options are relatively sensitive, but it might not be to do with this.

To be sure, set the “VAD lower volume” value to 1, which will disable it. Note: If you’re using the whisper model for transcribing the speech then you can’t disable it.

Let me know how you get on. If you have a URL where I can try it, fire that over to me too and I’ll test it my end.

thierry · November 19, 2024, 3:23pm

The issue is resolved when I use my AirPods. The audio feedback of its own response must be interrupting the speech.

pork1977gm · November 19, 2024, 3:32pm

Ah ok, yeh I need to work on that, echo cancellation and noise cancellation options for the audio stream which is opened up in the browser, aren’t particularly great as I’ve had the same problem myself. I’ll do a bit of research into this area and try and make some improvements around it.

Paul

pork1977gm · November 19, 2024, 4:28pm

I’d love to see some finished apps or prototypes with this AI plugin, there’s a lot you can do with it. Anyone with some links to some finished products, put them up!

thierry · November 22, 2024, 1:30pm

Hi @pork1977gm
I encounter an error when I check “Pinecone RAG integration.” It no longer detects the audio source.

For the demo, you’ll need to wait a bit longer, but it will be my pleasure to showcase it!
Capture d’écran 2024-11-22 à 14.20.12

pork1977gm · November 22, 2024, 1:31pm

Ooh ok… thanks for that, I’ll check it out and get it fixed.

pork1977gm · November 25, 2024, 10:12am

I haven’t forgotten about this, will hopefully get it fixed today. Sorry for the wait, been stuck with something else.

pork1977gm · November 25, 2024, 2:04pm

HI @thierry

I’m struggling to replicate this. Can I ask, where in your workflows are you running the ‘Start microphone’ action? I’m wondering if it’s running just a little too early.

As a test, do you think you could throw in a pause action just before-hand? then let me know what happens.

Paul

ts11 · November 27, 2024, 2:10am

Hi @pork1977gm Hope you’re well these days. I’m encountering an error when trying to load a saved conversation.

Likewise, when I update past version 1.35.0, I start to get a console error that seems to be related to the “total_tokens” state. Below you can see both error messages. Any ideas? Thanks!

Topic		Replies	Views
Anyone using Groq API for AI projects? Feedback? Questions	11	2006	April 21, 2024
:robot: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ Groq - Lightspeed AI Chat Streaming with Your App Data [Native RAG] incl. Markdown + LaTeX, Tools/Function Calling [Keeps your keys secure] Plugins	9	548	February 11, 2025
AI chatbot in app Need help	3	42	April 4, 2025
[New Plugin] Groq AI with Groq multimodal, Vision, Chat and Speech to text with a lightweight audio recorder Plugins	4	89	February 3, 2025
:robot: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ Google AI Gemini On Your Data [Native RAG] (Prompt/Model Hiding, Vision, Tools (Functions), Token Usage & Markdown + LaTeX, Prompt Caching) [Keeps your keys secure] Plugins	9	1044	February 13, 2025

[PLUGIN] - AI BOT + Groq/Gemini Inference

Whisper STT integration + OpenAI TTS functionality

Vision capabilities - yes the AI BOT can now see everything!

Related topics