It looks like you have that action running before the plugin has fully started up, as I can see the error appearing before it reports those other console logs that start with [aibot]…
Where do you have it in your workflows? Maybe you could try moving it into the “ai_bot_ready” event? Let me know what happens when you try that.
I’ve just added a conditional check before the state attempts to populate which will be in the next update soon.
That did the trick! Thank you. Still having to stay on 1.35.0 to avoid the other error, related to “total_tokens,” however. Unrelated, but the bot doesn’t seem to be able to speak when I access via Brave browser on the latest iphone model. It listens, but when it speaks, there is no audio. Do you know why that might be?
I’ll take a look, it’s not a browser I’ve tested against (or heard of) but I’ll install it and see what’s going on. I’m just working on fixing a few cross browser issues with Pinecone right now, and the total tokens error I’ve fixed already. Should be ready tomorrow.
Can you try the update I’ve just pushed (v1.43.0). It has some changes around the Pinecone/RAG integration so it works with all browsers now and the total_tokens error should now be resolved. I tried the Brave browser and the first time it opened up and I started a call, the audio playback didn’t seem to trigger at all but I restarted and tested again on both desktop and iPhone then it seemed to work. It feels like there could be an underlining issue still so I’ll keep working on it but if you can try everything again then just keep me updated. Feel free to PM me on this. Anything I find, I’ll try and let you know.
This update provides a new visual element designed to animate with the speech coming out of the BOT. There is a new state included called “speech loudness” that can be used in the options to change the animation based on the volume of the audio. Groq host a few whisper models, some are from OpenAI and there is one from Hugging Face.
There’s also a version incompatibility fix that presented itself within the VAD scripts, and an option added allowing you to select the specific whisper model when using the whisper service for transcribing audio to text.
Just documenting a few updates (keepin’ the thread alive!)
Some additional tweaks have been applied around using the Vision model, although it still needs some improvements. The instructions given to the BOT regarding how this works have been provided as an option.
The setting called “Pinecone database only” has had a few tweaks. This option is model specific, meaning you’ll need to be running on one of the later Llama models to allow it to function as intended. The documentation section for this has been updated to reflect this.
ElevenLabs have recently released their new “Eleven Flash v2.5” text to speech model which is designed for extremely low latency tasks.
Hi @pork1977gm I had a quick question. I’ve noticed that, when using OpenAI (standard, not HD) as the TTS service, it almost always cuts off or muffles the first word of every sentence. E.g. if the sentence were “the chicken crossed the road,” the bot would dictate only “chicken crossed the road.” Is this a limitation of the model, or is it something that might be improved within the plugin itself?
@pork1977gm I encountered one other issue while testing. It related to how the bot handles interruptions. It seems that, when interrupted, the bot will respond to the interruption, as expected. However, having done this, it will then continue with the previous response, picking up from where the interruption began with no clear or logical transition. This results in disjointed and confusing. Have you noticed this behavior?
Silly question, but are you by any chance using Bluetooth speakers? I’ve noticed the same problem but only when using Bluetooth. It almost feels like it needs a moment to fully pick up the signal or something, but when I swap to using the built-in or hard-wired speakers in the laptop I’m currently testing on, I don’t see this problem. I’ll continue to test, but thought it was worth asking you the question anyway.
With regards to the interruptions, how it works right now is that upon being interrupted, the volume lowers to the value specified (defaults to 0.1) and if nothing is asked of the BOT then it will bump the volume back up to 1 again exactly 1 second after the interruption initially occurred at. During this process, the audio continues to play as normal.
I’ve added an extra option around this (defaults to false), so now any audio playback will stop upon being interrupted. Rather than the audio just stopping, it should fade out, just to make it feel like it’s not so much of a harsh cut-off.
See how you get on with that. If you can think of any better ways to handle interruptions then just let me know, I can make any suggested changes in an effort to make the process feel a little more natural.
Be aware, VAD can be triggered if you’re using external speakers and the volume is high.
Hi there! Thanks so much for the quick follow-up. I’m not using bluetooth speakers. I was testing on mobile (ios) when I noticed the problem, and it is very consistent there. I’ll need to confirm later today that the issue is also present on my laptop. With regard to the interruptions, I think I might not have explained the problem clearly before: The bot continues with the audio that was being played prior to the interruption, even when the user does ask something upon interrupting. A hypothetical exchange to illustrate:
User: Sing “Happy Birthday.”
Bot: “Happy birthday to you. Happy birthday …”
User: “What day is it today?”
Bot: “Friday.”
Bot: “… to you. Happy birthday, dear …”
It seems as if, even though the bot might discontinue playback of the current chunk upon interruption, it still plays subsequent chunks from the response that began streaming prior to interruption. In the office now, but I’ll try to record a video later to reproduce the behavior.
That’s great to hear, thank you very much, it’s not perfect though, still needs a bit of work but I hope to get some vision models from Hugging Face implemented at some point through a Hugging Face plugin I’m currently working on. It’s not yet complete but the demo page I’m in the process of building out is here:
Regarding the Vision feature, it currently uses a model which Groq still have in preview mode, so that could change. It’s something I’m keeping an eye on. Once I have the Hugging Face plugin completed, I’m hoping to use the various vision related models hosted through the Inference API but I’m not quite there yet.
Yeh I see DeepSeek, it’s a model users have been asking Groq to host also, although it’s a paid service from what I can see so I doubt that will happen. It looks like they only host their models on Hugging Face for others to download and setup themselves in a server environment. It doesn’t look like you can use any of the Hugging Face libraries to load those models but I could be wrong.
Yeah. Scalability wise and for your own internal tech, having your own running on a server is handy especially when Groq has rate limits per min/day, etc, etc.
Will wait until these evolve a little more.
But it’s interesting that you can now just get hold of a leading OS AI and just plug it on yourself lol