New LLM streaming plugin!

Hi everyone,
I have just launched a new plugin for $55 once or $10/month that makes it super easy to connect to the major LLMs (GPT, Claude and Gemini (Grok coming in 2 weeks with others to follow)).

It supports the following:

  • Streaming with all LLMs
  • Function calling for GPT (now called tools)
  • GPT assistants with streaming
  • Webhook functionality
  • Protected API keys (if you are using a plugin that doesn’t protect them, anyone with a little bit of knowhow can very easily steal your API key)

There are a ton of other features that are explained in great detail on the demo page (link below).

The plugin demo page can be found here:
LLM connector demo (bubbleapps.io)

The editor can be found here:
LLM connector demo | Bubble Editor

The plugin page can be found here:
LLM with Streaming Plugin | Bubble

This came as a result of me needing a solution for all the apps I was building that all required LLM integrations. I wanted a plugin which had a bunch of features that would make things a lot easier. I continue to use the plugin in all my apps so even if there are no downloads, I will still be updating it for my own benefit.

If any of the instructions are unclear in the demo page, I will very responsive on the forums and be constantly updating the demo page to make it crystal clear.

Hope you guys enjoy.

5 Likes

Who owns and maintains the intermediate server in the middle?

Sorry for the slow reply.

It’s my own EC2 instance on an AWS server.

Please let me know if you have any other questions.

I will also be releasing a feature in the next couple of weeks that allows you to make the call directly to the LLM without the middle server if you aren’t concerned about API key theft, such as an admin page.

1 Like

Hi Paul. What type of text display container would you recommend to take advantage of your streaming scrolling functionality?

On a different topic, I’ve had a couple of instances where the OpenAI Assistant has said that he was going to prepare a report (file retrieval or code interpretation) and has failed to come back. It’s probably the time-out that you mentioned yesterday. I will try to reproduce and send you console log.

Another consideration in my use case is a user coming back and reengaging with a previous thread sometime later. I assume this is going to cause problems with the token expiring?
In my current setup, I take the OpenAI response and use a plugin to parse Markdown to BB-code so it can display formatted properly in Buuble. I assume that such a workflow step would negate the streaming?

Hi @ruimluis7
Just put a text element inside a group element and make sure you check the option “allow scrolling when content overflows” on the group element. The id of the group element should match the id field set in the “Call LLM” action.

In terms of your other question, I would have to see some error logs to determine the issue. It’s one of three things:

  1. OpenAI assistants has some issues with file retrieval as per some forum threads on this topic where it responds saying “I don’t have teh file” when in fact it does. Not sure if this is related to your issue but it’s possible.
  2. The action is timing out but if it was, you would get a popup saying the bubble action timed out after 30 seconds
  3. Some other issue in the plugin that I could only diagnose with seeing your error logs.

Not really. If a user comes back a while later, you will just have to rerun the “Generate tokens” action again. This will create new tokens that can be used in the “Call LLM” action again.

In general, it’s good practice to just be calling the Generate tokens action every time a user asks a question regardless of whether they are asking their questions 5 seconds apart or 5 days apart.

Let me know if you still have further questions.

Hi @paul29 This is great. I’ve installed the plugin and it works really well. I was testing out the call cost functionality for streaming and it doesn’t seem to work. Am I doing something wrong?

Thanks. Glad it’s working out for you. The pricing on the streaming functionality isn’t supported natively by the LLM providers so I had to defer it as a feature. It’s getting completed this weekend along with a new feature for multi-modal capabilities with Gemini. It takes about a week for bubble to approve the updates so the will be available for upgrade by about the end of next week or early the following week. I’ll post here when it’s ready for an update.

Hi @paul29. Everything is cool so far.
Is there any way to format the streaming output (BBcode, HTML) to reflect the markdown generated by the OpenAI Assistant?
Also, the OpenAI assistant lets you choose API version (attached). Is this something we should adopt?
thx

Hi @ruimluis7
Thank you. Glad you’re enjoying it.

Yes, you can have gpt respond with html tags and then have the response from the plugin input into an html element instead of a text element. Like this:
image

Here is what the output looks like:
image

Also, I don’t see anythign attached.

Good idea. I will include this in the next release. I am publishing a release tomorrow which will take about a week to be reviewed and approved by bubble

Took me a little longer than anticipated but I have just submitted a new version to bubble (they take about a week to review) but hopefully by Friday the new features available will be:

  1. Vision capabilities for GPT, Claude and Gemini
  2. Webhook for GPT assistants (without a webhook you have to continue to check to see if if the “run” has completed. Webhook functionality takes care of that for you)
  3. Better error handling (i.e. if you chose a model that you don’t have access to, you will get a better description of what you did wrong)

In the meantime, I am working on:

  1. Implementation of Groq (different than X’s Grok) which allows you to easily use open source models and drastically reduce token costs
  2. Implementation of CrewAI
  3. Adding Cohere as a provider
  4. Ability to provide a link to your own hosted open source models
1 Like

Nice! Looking forward to that update. Quick question, when should I use the webhook functionality versus the regular server call?

Glad you like it.

Bubble actions time out after 30 seconds so if your LLM is going to take longer than 30 seconds to respond, then you need to use the webhook. Usually you would use a backend api workflow as your webhook to alert your own app when the call is complete.

1 Like

V4 of the plugin has just been released. @ruimluis7 You will now have the ability to specify the version of assistants

1 Like

very exciting plugin!

1 Like