New LLM streaming plugin!

paul29 · March 23, 2024, 1:30am

Hi everyone,
I have just launched a new plugin for $55 once or $10/month that makes it super easy to connect to the major LLMs (GPT, Claude and Gemini (Grok coming in 2 weeks with others to follow)).

It supports the following:

Streaming with all LLMs
Function calling for GPT (now called tools)
GPT assistants with streaming
Webhook functionality
Protected API keys (if you are using a plugin that doesn’t protect them, anyone with a little bit of knowhow can very easily steal your API key)

There are a ton of other features that are explained in great detail on the demo page (link below).

The plugin demo page can be found here:
LLM connector demo (bubbleapps.io)

The editor can be found here:
LLM connector demo | Bubble Editor

The plugin page can be found here:
LLM with Streaming Plugin | Bubble

This came as a result of me needing a solution for all the apps I was building that all required LLM integrations. I wanted a plugin which had a bunch of features that would make things a lot easier. I continue to use the plugin in all my apps so even if there are no downloads, I will still be updating it for my own benefit.

If any of the instructions are unclear in the demo page, I will very responsive on the forums and be constantly updating the demo page to make it crystal clear.

Hope you guys enjoy.

ZubairLK · March 23, 2024, 5:23am

Who owns and maintains the intermediate server in the middle?

paul29 · March 23, 2024, 10:40pm

Sorry for the slow reply.

It’s my own EC2 instance on an AWS server.

Please let me know if you have any other questions.

paul29 · March 23, 2024, 10:42pm

I will also be releasing a feature in the next couple of weeks that allows you to make the call directly to the LLM without the middle server if you aren’t concerned about API key theft, such as an admin page.

ruimluis7 · April 3, 2024, 2:08pm

Hi Paul. What type of text display container would you recommend to take advantage of your streaming scrolling functionality?

On a different topic, I’ve had a couple of instances where the OpenAI Assistant has said that he was going to prepare a report (file retrieval or code interpretation) and has failed to come back. It’s probably the time-out that you mentioned yesterday. I will try to reproduce and send you console log.

ruimluis7 · April 3, 2024, 2:53pm

Another consideration in my use case is a user coming back and reengaging with a previous thread sometime later. I assume this is going to cause problems with the token expiring?
In my current setup, I take the OpenAI response and use a plugin to parse Markdown to BB-code so it can display formatted properly in Buuble. I assume that such a workflow step would negate the streaming?

paul29 · April 4, 2024, 4:46pm

Hi @ruimluis7
Just put a text element inside a group element and make sure you check the option “allow scrolling when content overflows” on the group element. The id of the group element should match the id field set in the “Call LLM” action.

In terms of your other question, I would have to see some error logs to determine the issue. It’s one of three things:

OpenAI assistants has some issues with file retrieval as per some forum threads on this topic where it responds saying “I don’t have teh file” when in fact it does. Not sure if this is related to your issue but it’s possible.
The action is timing out but if it was, you would get a popup saying the bubble action timed out after 30 seconds
Some other issue in the plugin that I could only diagnose with seeing your error logs.

paul29 · April 4, 2024, 4:52pm

Not really. If a user comes back a while later, you will just have to rerun the “Generate tokens” action again. This will create new tokens that can be used in the “Call LLM” action again.

In general, it’s good practice to just be calling the Generate tokens action every time a user asks a question regardless of whether they are asking their questions 5 seconds apart or 5 days apart.

Let me know if you still have further questions.

andreas4 · April 12, 2024, 6:02pm

Hi @paul29 This is great. I’ve installed the plugin and it works really well. I was testing out the call cost functionality for streaming and it doesn’t seem to work. Am I doing something wrong?

paul29 · April 12, 2024, 6:09pm

Thanks. Glad it’s working out for you. The pricing on the streaming functionality isn’t supported natively by the LLM providers so I had to defer it as a feature. It’s getting completed this weekend along with a new feature for multi-modal capabilities with Gemini. It takes about a week for bubble to approve the updates so the will be available for upgrade by about the end of next week or early the following week. I’ll post here when it’s ready for an update.

ruimluis7 · April 22, 2024, 8:11pm

Hi @paul29. Everything is cool so far.
Is there any way to format the streaming output (BBcode, HTML) to reflect the markdown generated by the OpenAI Assistant?
Also, the OpenAI assistant lets you choose API version (attached). Is this something we should adopt?
thx

paul29 · April 22, 2024, 9:22pm

Hi @ruimluis7
Thank you. Glad you’re enjoying it.

Yes, you can have gpt respond with html tags and then have the response from the plugin input into an html element instead of a text element. Like this:

Here is what the output looks like:

paul29 · April 22, 2024, 9:40pm

Also, I don’t see anythign attached.

ruimluis7 · April 22, 2024, 9:53pm

paul29 · April 22, 2024, 10:02pm

Good idea. I will include this in the next release. I am publishing a release tomorrow which will take about a week to be reviewed and approved by bubble

paul29 · April 29, 2024, 7:00pm

Took me a little longer than anticipated but I have just submitted a new version to bubble (they take about a week to review) but hopefully by Friday the new features available will be:

Vision capabilities for GPT, Claude and Gemini
Webhook for GPT assistants (without a webhook you have to continue to check to see if if the “run” has completed. Webhook functionality takes care of that for you)
Better error handling (i.e. if you chose a model that you don’t have access to, you will get a better description of what you did wrong)

In the meantime, I am working on:

Implementation of Groq (different than X’s Grok) which allows you to easily use open source models and drastically reduce token costs
Implementation of CrewAI
Adding Cohere as a provider
Ability to provide a link to your own hosted open source models

hsmyles · May 1, 2024, 9:19pm

Nice! Looking forward to that update. Quick question, when should I use the webhook functionality versus the regular server call?

paul29 · May 1, 2024, 9:26pm

Glad you like it.

Bubble actions time out after 30 seconds so if your LLM is going to take longer than 30 seconds to respond, then you need to use the webhook. Usually you would use a backend api workflow as your webhook to alert your own app when the call is complete.

paul29 · May 1, 2024, 9:31pm

V4 of the plugin has just been released. @ruimluis7 You will now have the ability to specify the version of assistants

segongora9 · May 2, 2024, 3:49pm

very exciting plugin!

Topic		Replies	Views
[UPDATED PLUGIN] - ChatGPT/LLM Toolkit - Assistant Streaming, 100+ models, custom Endpoints (eg Azure), and more! Plugins	32	1143	March 6, 2025
:robot: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ OpenAI - Assistants with Streaming, Markdown & LaTeX, Function/Tool Calling, Files & Generated Graphics Display Support (inc. GPT 4o, o1, o3 support) [Keeps your keys secure] Plugins	27	975	November 16, 2024
:robot: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ Flowise - Chat Streaming (inc. Markdown & LaTeX support) [Keeps your keys secure] Plugins	0	114	October 1, 2024
[FREE PLUGIN] AI Proxy - ChatGPT, OpenAI, Claude, LlaMA, PaLM Real-Time Streaming Plugins	189	21786	December 9, 2024
[NEW] The Most Complete OpenAI Plugin (Assistants, TTS, Chat stream...) Plugins	5	817	October 2, 2024

New LLM streaming plugin!

Related topics