New LLM streaming plugin!

Hey Paul…
1 question - is there any chance to add streaming inside a RICH TEXT EDITOR ?
I mean - keep adding, inserting for every call we make for streaming…
If we set the INITIAL CONTENT in RICH EDITOR - for an STREAMING ELEMENT - it will stream… but it will not keep the last content into the editor…
So, is there any way to do that ??

Thanks - and as I said before - GREAT PLUGIN!

What do you mean by “it won’t keep the last content”? Do you mean that when the streaming is complete the content clears?

Yes… I didn´t find a way to make it work… have you ever try this??
Just a simple RICH TEXT EDITOR, you can write and then, you can make a call to OPENAI (for exemple) and the streaming will happen inside the editor… after completed, you can contitue to write… and the whole content will still there!

This should have nothing to do with my plugin. Have you tried running in step by step mode to make sure you don’t have an action that is clearing the RTE or calling the plugin action “Clear LLM Element”? If the intial value of the RTE is set to the plugin’s “Streamed response”, you are likely clearing this value if it is disappearing when the stream is completed

Hi Paul, i just installed your plugin. Awesome features! Is there a document on how to implement streaming for flowise? Thanks in advance, Daniel

Hi Daniel,
Glad you like the plugin and thanks for downloading.

thanks for alerting me to the fact that their isn’t documentation on flowise. I will get that added to my documentation page ASAP.

In the meantime, you can follow these instructions:
In the generate tokens action, put in your api key from your flow in flowise (don’t included “Bearer”)
image

Set the messages to a text value in square brackets:
image

Set the flowise url here:
image

That being said, I have tested this on my end and found that I am getting an error. I am looking into it and will have pushed a fix by tomorrow.

Hi Paul,

Having some issues when trying to use Gemini model within dynamic models.

Prompt is structured as follows:

[
    {
        "llm_provider": "Gemini",
        "model": Reu_Generate_Content's Prompt Template's Required model's API ID:formatted as JSON-safe,
        "token_expiry": "1",
"contents": [
    {
      "role": "model",
      "parts":[
        { "text":Conversation's Prompt's System prompt:formatted as JSON-safe
            }
        ]
},
{"role": "user",
"parts":[
  { "text":Arbitrary text:formatted as JSON-safe        }
    ]
  }
]
}
]

With the models ID being “gemini-1.5-pro-002” as per Googles documentation.

I then get “Chosen LLM provider and model are mismatched” and “The plugin LLM with Streaming / action Call LLM a Stream LLM Element threw the following error: Chosen LLM provider and model are mismatched (please report this to the plugin author” as error messages.

Have tried a few variations on the model ID to no avail - is the plugin set to a static list of models or should it be dynamic in whatever new models are released (which I’d assume to be the case)- otherwise, any insight into what I might be doing wrong (I’m expecting an “obvious” error on my part but stumped at the moment).

Thanks,

Hi @daniel62
I have found the Flowise issue and fixed it. There was an expired certificate that expires every 3 months. I have set up an automation server side to automate this renewal so this will no longer be an issue.
Let me know if you need any more help getting this set up.

1 Like

Hi @NoCodePhill
I am at BubbleCon right now. I can look into this issue tomorrow and get back to you. Thanks for your patience.
Paul

@NoCodePhill
I have not added that model yet.
On page load, if you inspect your network traffic, you will see the api call the plugin makes to retrieve all of the models that are available in the plugin.
Here is the url its calling:
https://py-llm.taistr.com/static/modelPrices.json

Can you try with one of the existing Gemini models? I will get the newmodel added tomorrow.

1 Like

Hi @paul29, thank you so much, i will test and let you know the results.

1 Like

@daniel62 I found some time. I have added that gemini model.

Also, I have added the active models to the bottom of the demo page:

Hi @paul29. Is it possible to simply call back a response from OpenAI using your plugin … but use an alternative plugin to handle the prompt and response. It includes embeddings and multiple vector stores. For context I am using pinecone to add files to separate vector stores associated with specific users, I vectorize both the prompts and the responses and openai simply converts it to readable text … is my understanding.

I also noticed that you added an option to pick n endpoint for the vector store but not sure how to use it (haven’t seen it in documentation).

Im not quite sure what you mean by “simply call back a response from openai”. Can you give me a little bit more detail?

As for the documentation for the vector store action, I don’t really have any documentation on assistants as all of these endpoints are just set up in the plugin’s version of the api conenctor which is the same as what you get access to in your api connector. The documentation to follow is the documentation from openai. That being said, if you’re getting an error on something, please send me a screenshot and I’ll try to help you through it.

@daniel62 I realized I tagged the wrong person for the gemini model message.

@NoCodePhill this message was meant for you. Also, the question you asked me via email regarding the ability to see the response, you can just open your network traffic

2 Likes

Hi Paul, i have tried to set it up but got stuck.

This is my workflow:

And settings per step

Hi @daniel62

You’re combining a couple of concepts here. In general, I would have a read through the documentation on the demo page (linked at the top of this thread) which will help explain more of the features.

But just to briefly explain here.

You are using “Generate tokens” (GT) and “Server call to LLM” actions. These never go together. There are 2 situations:

  1. When you want to stream to your front end (client side)
  • First add GT action then add “Call LLM” action
  • GT action is just there to protect your LLM API key
  • Call LLM action is there to make the streaming happen. This action requires the result of the GT action (make use you click on the “show documentation” buttons in the actions as there is a lot of explanation as to what those fields are for
  • Set the stream field to “yes”
  • You do not need the “Display data” step. You can use the exposed state from the plugin called “Streamed response”. In a text box, set the value to “Stream element’s Streamed response”. This will contain the value of the streamed response from flowise.
  • Lastly, if streaming does not work when you test out your AI application in your flowise dashaboard, it will not stream into your bubble application. There are certain nodes that you can add in flowise that prevent streaming from occuring.
  1. When you don’t care about streaming
  • use the “Server call to LLM” action
  • Or use the setup above but set the “stream” field to “no”

Again, reading the documentation will help you better understand how the plugin is supposed to work

This will take at least 10 minutes, so please reply when its convenient for you


Okay here is some context … when a user uploads a file that they would like to query inside an AI assistant that has a pre-populated vector store, I create a new thing with the fields for the file itself (data) and associated aorganization (current page thread)

I then vectorize the text in this step (I’m not sure if I should find a way to convert the file content to pure text

I then save some important information related to the document (documentation for each field is expanded
Up to this point Everything seems to work fine and as expected and is where things get tricky

To answer a query related to the uploaded & vectorized doc: I utilize field generate tokens the visible fields are the only ones I populate

Then I vectorize the prompt on the user’s end

I guess here I used the vectorized user prompt to query the organization id where the document the user is querying is saved

Step 4 and 5 is where I believe everything goes wrong For step 4 here are the relevant screenshot


So I guess my question is, is there a way to use my an external knowledge base when using a GPT by dynamically changing the vector store end point when a user queries a document that the GPT is not trained and if this is possible in this plugin, what changes do you suggest I make to my build. As per usual if I didn’t make myself clear enough, I’ve sent a link to my editor if you want to take a closer look

Thanks, I managed to get Llama and Mixtral models working this way.

Am still having issues with Gemini however, although not the same problem. I have set the correct model ID, but now receive a “index.js:1 SSE error: {“detail”:“contents must not be empty”}” error.

Although following my debugger I can see that values are being populated for the fields where the input is dynamic,

The JSON is setup correctly to the best of my (and chatGPTS) knowledge:

[
    {
        "llm_provider": "Gemini",
        "model": Reu_Generate_Content's Prompt Template's Required model's API ID:formatted as JSON-safe,
        "token_expiry": "1",
"contents": [
    {
      "role": "model",
      "parts":[
        { "text":Conversation's Prompt's System prompt:formatted as JSON-safe
            }
        ]
},
{"role": "user",
"parts":[
  { "text":Arbitrary text:formatted as JSON-safe        }
    ]
  }
]
}
]

Thanks,

Hi @NoCodePhill
I was able to reproduce the error. Will need a day to look into the issue. Will get back to you asap

1 Like