Langchain with vector database connected to Bubble

Hi everyone,

I’m looking for advice for how to get started adding Langchain and a vector database to my GPT Bubble App to give the app custom knowledge from documents etc. I don’t see any resources on how to get this working at the moment in Bubble, so any pointers or resources that can help to get started on the right path would be a useful resource for anyone else looking for the same things.

I realise that there is unlikely to be a fully no-code solution at this point. It may be possible for GPT4 to help with coding, however it seems that GPT4 is not aware of Langchain unless the Langchain Docs plugin is used, which I don’t think is available yet.

Here are some useful things I would like to look at making use of, but I’m not entirely sure if or how these things can fit together at the moment:

Langchain UI

Steamship (hosting managed Langchain app)

I wouldn’t be surprised if someone brings out a Bubble plugin for Langchain at some point, but until then, can we tap into the wisdom of the Bubble forum and collect our resources and experiences in one place?

5 Likes

Yes, without plugin I don’t see any way to integrate langchain in Bubble. May be I will write one.

But in the mean time, use replit and write code, package into API. Call Api via the Bubble.

Share your usecase, what you want to achieve. May be I can help you.

Ankur@Nocodetalks
Looking for a Bubble Coach? Buy Bubble.io mentorship

1 Like

Thanks Ankur!

Replit seems to be the way to go at the moment. According to Steamship (linked above) I can just clone one of their templates, but coming from no-code and Bubble and not knowing how to code apart from very basic stuff, it’s really not very clear to me what I am supposed to do even just to get started by cloning their template. I see ‘fork’ but not ‘clone’ or anything similar.

Here’s what i want to achieve:

  • Offer my Bubble users the capability to upload their own documents to be used by ChatGPT or another LLM as necessary. (Currently I use the Bubble API connector to return results from the OpenAI ChatGPT API with some prompt engineering.)
  • Eventually I’d like to add other functions in Langchain such as memory and chains including other APIs for more specialised results.

So it seems once I have a handle on how to use Steamship with Replit I should have an API I can call from Bubble. What I need to figure out is how to achieve this as a no-coder. Happy to learn what’s necessary to work with the code, but I don’t know what I need to know to make this all happen.

:pray::pray:

@freelymoving

watch this video - GPT-4 & LangChain Tutorial: How to Chat With A 56-Page PDF Document (w/Pinecone) - YouTube

Try to put JS into bubble plugin and call it via the bubble. Not very hard JS though.

Thanks, I have watched the video. This is quite hard to understand from my POV (not a coder!) - I do not know much at all about JS.

I have now forked a Steamship template in Replit, and this is using Python, so I’m confused as to whether I should be trying to make sense of Python or JS. As I understand it the JS version of Langchain is not as well developed as the Python version.

Any advice as to how to customise this Steamship template and incorporate that into Bubble via API would be extremely useful.

Do you want to build Q/A chatbot kind of stuff based on the custom data?

Yes, but with the docs they upload (or the vectors) stored for querying as needed, so users can get results from the GPT API but using info from the uploaded docs in responses when relevant.

@freelymoving how big the pdf can be?

I think i can help you , need to find a solution to let upload the file. Because right now, I am doing local pdf files.

I’m not sure how big the pdfs will be, probably not huge to start with.

In Replit it looks like you can use a URL for a pdf, which would be better than uploading local files eg. to Bubble as it will run out of storage pretty quick.

1 Like

Any Langchain experts out there? Or anyone experimenting with Langchain and a vector database in Bubble? Please feel free to chime in with any tips or best practices etc.

Hey @freelymoving, I 'm a noob too but trying to do the exact same thing as you. I did see that same YouTube tutorial and want to get the same thing working inside of Bubble.

If you do figure it out or if anyone in this thread manages, please do let me know. I would be happy to pay someone a little to get this setup for me. :slight_smile:

Thanks!
Guy

Hey! I am going to build out something like this next week. Ultimately for the use case you are describing @freelymoving, I don’t think you’ll need LangChain specifically, though could help with built in functions for what I’ll list out below. Main thing is getting a vector database, like Pinecone (or saw Supabase now also supports vectors through pgvector).

I am going to give it a go with Xano to coordinate data flows and do some of the parsing. (native in Bubble is not possible)

Basic flow needed:

  1. Upload document.
  2. Parse document into chunks (small enough that you can then send to OpenAI to return word embeddings). Best if each chunk is a connected idea (so the vectors make the most sense)
  3. Send each chunk to OpenAI embedding API and get back the vectors.
  4. Store vectors, original content text, and any needed meta data to a vector database for each chunk.
  5. Create UI in Bubble to get a question.
  6. Send text of question to OpenAI embedding API to get vectors.
  7. Send vectors of question to vector database to search for similar vectors (signifying semantic similarity)
  8. Return document chunks with similarity to the question text.
  9. Send Content Text for matched document chunks along with question text in a prompt to OpenAI completions or chat/completions API (saying something like “answer this questions based on the following context” where you include all of the matched Content Text).
  10. Return answer to user question based on documents uploaded.
4 Likes

I’m sure we can do this. The fact is it’s all very early, developing rapidly, and there are not that many people doing it. New things are coming out daily so let’s see how things progress. It’s definitely possible, it’s just a matter of finding the best way to achieve the desired results.

1 Like

Hey @jeffbuze :wave:

I want to use Langchain because I’d like to be able to incorporate some of the versatility and other features later on such as chains, agents etc. I also want the workflow to be similar to the diagram below, so that the results from the LLM are augmented with your data from the vector store when necessary:

image
From: Retrieval

  1. Is there a reason why you want to use the OpenAI embedding API to create the vectors, is it not possible to do this with Langchain?

  2. Is it necessary to send this info in a prompt? I thought the vector database performed this function?

Interested to see how you get on with the vector database. I’m leaning towards Pinecone, but I have no idea what their pricing translates to in real world use. Also interested to hear how you’re using Xano as I don’t quite get how it’s being used in this case (it doesn’t support vectors right?)

I think I might have found a bubble plugin that use Langchain.

Let me know if this helps.

1 Like

I like to think of Langchain as a backend coordination layer for the different components needed to make uses cases that add memory into LLMs. (it’s a library meant to be run on backend code).

Since I don’t code (and I believe you do not either), I am suggesting using Xano as the coordination layer instead. All the components (text embeddings, vector store, LLM prompts, document loading, text splitting, etc. remain mostly the same)

For example, LangChain is actually not doing the embeddings, they offer a function to do the word embeddings across multiple providers (OpenAI, Cohere, etc.) (doc here: OpenAI — 🦜🔗 LangChain 0.0.139)

In regards to sending via a prompt (assuming you are referring to my step 9, not step 4), yes that is what LangChain would do and is actually exactly what the diagram you included shows. Standalone question text embeddings go to the vectorstore to search for similarity, it returns the documents back that are similar, and then sends them back to the LLM (as a prompt with the original questions). (Details here: Question Answering — 🦜🔗 LangChain 0.0.139)

You can see under the title Custom Prompts an example prompt with the LangChain code to include the matched documents from the semantic search to the vectorstore.

1 Like

I’m looking to create something similar where I can upload pdf and text files and get responses with GPT.

I found this video that uses a database with vectors + GPT+ bubble.

I hope it helps, I’ll follow this conversation to see if you managed to get good results.

Greetings.

6 Likes

Thanks for clarifying, I think I understand better now and I appreciate your perspective. Would love to hear how you get on with Xano.

This looks like it could be a good solution for some, depending on the use case. It looks as though you need a subscription to fine-tuner.ai, and you are limited to the use cases/templates they offer. I’m also not sure where they are storing vectors, i guess on their own database. I would want to be in control of the database and not have my vectors tied to a 3rd party like fine-tuner.ai .

Also hard to calculate costs as they have their own credit system for training and retrieval.

Personally I’m not keen to add an extra 3rd party layer in here by using fine-tuner.ai, much as it does provide a solution for usage inside Bubble via their plugin. I want more control and I don’t want to be tied to them.

I haven’t used it of course, so if anyone has please chime in!

Awesome, I’ll update this thread as I make some progress. My goal for the day is to get a vector store up and running, be able to store and query text by vectors, and do the LLM prompt to answer a question based on text context.

For now I’ll manually do it, but document parsing and breaking down into chunks will be a goal for tomorrow.

1 Like