I’ve just launched a new AI BOT plugin that utilises Groq, Deepgram and ElevenLabs enabling users to communicate with the BOT through speech or text.
Additionally, you have the option to customize the BOT with your own data, allowing it to answer questions based on the information you’ve provided to it.
For those that don’t know… Groq is a large language model provider, or to be more precise, an AI infrastructure company that builds inference technology. Take a look here: Why Groq - Groq
Their streaming speeds are ridiculously quick when compared to other AI providers I’ve come across, hence the reason I’ve used them inside this plugin.
They’re also free to use, but for how long I don’t know.
The plugin sends/receives data directly from the user’s browser to optimize communications as efficiently as possible and to keep the interactions quick.
API Keys are protected through use of tokens, for which you’ll need to create a simple account through an authorization app I’ve setup to register any keys you might be using (explained in the instructions).
Some of the features included are:
It can remember conversational history to form more personalized responses.
Responses can be limited through use of tokens, so they’re not so long.
A conversational state has been included that can be used as the data source for repeating groups to create a chat type of interface.
Access to real-time data through Google is available through additional configuration using SerpAPI (Search Engine Results Page).
You can ask the BOT to generate images based upon a textual description of how you’d like it to look. Various image settings are included around this functionality.
Includes a voice activity detector which is capable of interrupting the BOT. The volume can be lowered when this happens.
Microphone streaming options are provided to help reduce any background noises or other interferences that may occur.
You can preload the BOT with conversation history from a previous period in time.
Supports both Deepgram and ElevenLabs text to speech services (each have their own advantages).
You can tell the BOT how you want it to behave, for example “You are a very helpful assistant who just loves to talk” or “You are calm and take on a methodical approach to questions” etc. Depending on the information you provide will determine how it responds.
There are 12 avatars to choose from when enabling the text to speech service from Deepgram. These all have different settings, voices and colors stored in a state to help design your UI.
ElevenLabs use more natural voices and are defined when you log into their site. They provide a much broader range of configuration when it comes to voices and also give you the ability to clone your own voice that can then be used in this plugin.
Function calling is setup allowing for additional requests to other providers to expand the BOT’s capabilities. Function calls cannot be user defined at the moment.
Hope it’s a useful tool for any businesses out there running on Bubble!
New option added to allow the BOT access to your location. When enabled, it will know where the user is and have knowledge on surrounding areas. This is enabled on the demo page so you can ask it where you are to test.
Next update will be integrating it with Google Search Results and Function Calling.
Function calling has been added to the plugin and the BOT now has access to real-time data from Google. The relevant setting for this is shown below and is a feature that can either be enabled/disabled.
This requires an additional key setting up in the token auth page to keep it secure. I’ve tested it quite extensively and ironed out a few issues that tend to come about with function calling. It seems to be working relatively well.
There’s also a new state indicating when the AI is waiting for a response from the user’s initial request, allowing you to do this…
Right now, I haven’t included a way to define your own type of function calls within the plugin, simply because I feel it’s quite a bewildering process for most to get their head around. I need to find a more elegant solution rather than having you define a list of tools with any given functions defined. It’s very much a dev task at the moment that I’d rather not expose through JSON fields and over complicate the plugin use. I’m currently working on this along with allowing access to the database.
Image creation has been included and the BOT is now capable of generating images based on textual content. This is a pretty cool feature, the generated images are very good and I’ve added 3 options to control the styles of images, currently photorealism, art and anime.
Images can be saved, there are now 2 additional fields in the ai conversation state. The “image_html” key contains the HTML with the image put into an img tag so you can use directly within an HTML element. The “image_file” key contains the actual file which can be downloaded/saved to your database.
Here’s are some examples of what it can do (functionality is currently enabled on the demo page to play with).
This is incredible. That said, I’m encountering some pretty big bugs when using the demo. Whenever I interact with the bot, it only seems to ever register the last sentence from whatever I said. Is this a known issue? I’d love to purchase, but this is obviously a non-starter.
Sorry I’ve been playing around with the demo site recently and the “Llama3-70b-8192” was having some issues with slow response times the other day. The demo page was using Google’s “gemma2-9b-it” model yesterday which may have behaved slightly differently when you were testing, but it’s now back on the “Llama3-70b-8192” model again.
That said, I’ve just been playing with it and it seems to be fine. When you say “register the last sentence from whatever I said” - how do you mean exactly?
Thanks so much for the quick follow-up. I’m afraid I’m still encountering the same issues today. Whenever I speak with the bot, it only picks up the last couple of seconds of whatever I said. So if I say something like “I’m doing really well, actually. Thanks for asking,” it might just register, “Thanks for asking.” For context, I’m encountering the same issue on both MS Edge and Brave browsers. I’m on an HP laptop.
Don’t know if this would make a difference, but I should also note I wasn’t using headphones. Just my laptops built in speaker, so maybe an audio sensitivity thing?
Also, unrelated, but I’m curious why you didn’t opt to use Scraper for the SERP integration. I noticed you have a Scaper plugin, and it seems like their prices are a good deal lower.
Yeh it could just be a mic thing, I’ve double checked and triple checked it, run a load of tests during the process and I would say at least 90% of the time, it was working well for me. A couple of times I also ran into the same problem you did but I’m finding that you have to speak pretty clearly and have little background noise for it to work well.
One thing I do inside the plugin, is that I create an additional audio worklet node that I then attach to the audio stream coming from the local devices audio input (mic in our case) and that monitors the audio levels to produce the values seen within the “microphone level” state.
Before-hand, this wasn’t something that could be turned off, but I’ve just made a minor adjustment which you’ll see in the image below that could help, it’s certainly worth testing anyway.
I’ll do some further testing on it tomorrow and report back with anything I find. If you have the plug-in installed, then try the update I’ve just pushed and let me know if it helps in anyway. I’ve disabled it on the demo page also so you can try it there too. Do you run into this problem if you try it on mobile?
Unrelated, but I also updated the doc section for the ‘Allow start on load’ setting.
I’ll continue to play with it tomorrow and see what else I can find.
Regarding SERP, the main reasons for not using scraperAPI was speed and the consistency of structured JSON in the response. I may take another look at it again.
This week I’m working on adding the ability to create embeddings, upload them to pinecone and have the BOT fully trained on all the data.
That’s very odd. I’m still encountering the same problem with the demo, speaking clearly in a quiet room. I’ve even tried on a different device to make sure it wasn’t an issue with my hardware. I love the promise of this plugin, but it consistently only picks up the last several words of whatever I say.
Update v1.10.0
This update introduces RAG, it allows the BOT to source custom data directly from a vector database held within Pinecone. It includes the ability to upload data from text or PDF files and answer questions based on relevance.
Enabling
If you want to use your own data, either as the only data source for answering questions, or using both the LLM and your own data, you can now enable the options shown below.
Because the plugin uses Pinecone, it will require an API key to use (a free one is sufficient for this) and it’ll also need a key from OpenAI. This is only used for generating embeddings, which are a way of representing data and it’s these that get stored in the database.
Configuring
There is a new section with 3 options in the main plugin settings just for Pinecone.
Uploading
Use the “Train AI BOT” action to upload data to your vector database. There are two additional options as shown below. All you need to do is supply either text or PDF and it will take care of the rest.
Update v1.12.0
This update includes a feature for deferring function calls. It’s the first part of adding the required bits to make this work. The same type of deferring will also exist for custom function calling (when it comes) which you’ll be able to set yourself through the TokenAuth page for the BOT to use, giving you the flexibility to call any endpoint and have the results put back into the conversation.
For now, this update focuses on the actual deferring, how it works and other changes that have been included.
Transcripts currently being played (spoken by the BOT) will be stopped if a text input message is sent before the last transcript has been fully read out.
Improved the response from the BOT when using the “Pinecone database only” option.
Improved the response from the BOT when it describes a generated image upon using the ‘output description’ option.
New setting under the AI Bot Options called “Error custom message”. This controls the response from the assistant when a chat completion is successful, but there’s no message content available (returned from the LLM). Whilst this shouldn’t happen, it can when using a model which doesn’t work too well with function calling for example, or the AI gets confused etc. The default message when this happens is: “I’m sorry, could you repeat that for me? I didn’t quite get that.”
Image generation abilities
The image generation is now capable of returning multiple images, you can either ask it to generate X number of images or occasionally you’ll find the BOT decides to give you some alternatives anyway. This can be dependant on the model selected, but it seems to work best using the ‘Llama 3 Groq 70B Tool Use’ model. As a result of this change, the ‘ai conversation’ state now contains a list of images where each image holds the image_file and image_html fields.
Function call deferring
Requires this option populating…
A new option called “Defer this function’s call results” has been added into the Getimg.AI (Text To Image) Options. When checked, the BOT will disregard responses from these function calls and not display images within the conversation history. The call is still made and any data returned (including any parameters etc) will be available in the ‘function call results’ state, allowing you to use the data in other ways. The event called ‘function call results populated’ will then be triggered.
New setting under the Getimg.AI (Text To Image) Options called “Custom response”. This allows you to define a custom response from the BOT when deferring function calls. The default response is empty, meaning the BOT will not reply with anything. You can use this to supply a custom response if needed.
New list state called ‘function call results’ and an event that triggers every time this state is populated called ‘function call results populated’. Data will appear in the state regardless of whether the deferring is enabled or not. This gives you all the details for each function call that runs and it is what’s needed to feed into the action below.
ACTION - Insert function call results This action allows you load the results from a function call back into the conversation. It inserts the relevant context at the position of the where the user's query was generated. If the conversation history is cleared or the user's question cannot be found, then it's inserted at the end of the conversation.
Going forward, there may be a few tweaks added to all this based on usability, so any questions, problems etc then just let me know.
There’s a few changes made to the demo page to support this functionality and you can now change the image generation type to see the differences between art, anime and photorealism.
This update pushes a big change to the tokenAuth page. There’s now a function call button which launches a UI containing all the options that are needed to create your own custom calls.
You set up an API Call first, include any parameters within the URL or the BODY of the request, and then test the call to see the JSON returned. After that, create the Function Call and populate all the required fields which informs the BOT of this new tool and how to utilise it.
I’ve included a button that says “add a real example” which will setup a test function call for you. This populates all the details with a real test call to the openroute service that returns step-by-step instructions from one destination to another. It will automatically be enabled.
To test it, just head over to your page where the BOT is running and ask it something like:
Can you get me the route using the following coordinates
Starting at: -0.1246254, 51.5007292 which is Big Ben, London
Ending at: -0.0235333, 51.5054306 which is Canary Wharf, London
Since it’s just an example, it’s not quite good enough to give it the name of a city since it requires a pair of longitude/latitude values for the start/end parameters, but there’s nothing stopping you from getting those parameters from a Bubble search box and passing them into the above message as an Arbitrary value dynamically into the “Send AI BOT Message” action
Give it a go and if you run into troubles or require changes to the UI to support different API requests, then please let me know and I’ll do what I can. The options around the API call part are relatively basic at the moment, but that will more than likely be expanded as time goes on.
I came across your plugin this weekend and just want to say that this is incredibly good work. Very impressive in its design, capabilities, documentation, and support for solid AI UX. Exemplary. Will be experimenting with it in the coming days. Cheers!
This update includes some modifications allowing the BOT to stream either audio or text directly to Hume’s expression management service to analyse the data and return a list of emotions.
The demo page currently has this enabled and the responses are seen in the repeating group in the bottom left of the page.
The tokenAuth page now has an area to store the Hume AI key, and there’s a new state called “emotions” that contains 2 further lists, a list of voice emotions and a list of chat emotions. The voice emotions is populated when you communicate with the AI BOT using the microphone, whilst the chat emotions state is populated when you send messages via the send message action.
The instructions page has a number of references to how this works and a lot of screenshots and other text have been updated.
This doesn’t use the empathic voice interface from Hume.
Is there a way to separate the more positive emotions from the negative or flat dry?
If you look at the image above, you’ll notice excitement would be within the same number range as anger so any settings to correspond with colours (green for positive or red for negative) can’t be done right?