Help with a semantic search and AI assistant (OpenAI, Pinecone)

*** preferred people/agencies ***

This is important. If you expect me to hand off a long, detailed spec doc and you simply want to crank out stuff like an order-taker, please don’t reply. I don’t work that way. I am interested in someone/agency who works with clients to understand what they need and then we can build stuff collaboratively. This will lead to more future work if you are a good partner. I do not want to work through a project manager, so if your agency works this way, do not reply. I am not interested in talking to a sales person, or project manager.

*** you must be ***

  • very strong with vector databases (BTW I think this is the best way to do this project, but if training OpenAI through the assistants API makes more sense, I’m good with that too. I have setup and made a prototype in Pinecone, but I’m not married to it.
  • very strong with AI assistants. I am using Open AI because Gemini and Claude don’t offer API access in Canada yet.
  • willing to pair program occasionally

***the project ***

I want people to be able to ask my assistant questions about change management and have it answer based on my existing content in bubble, as well as the multiple books I’ve written. The assistant will only give answers and insights based on my content, which is all stored in bubble. My content is basically structured like a periodic table in chemistry. Meaning, you can combine ‘my elements’ to create a customized approach to change management that fits your context.

I have 135 elements that have multiple categories, short/long descriptions and some other metadata. Each element has about a page of content (400 to 800 words). These elements are related to each other in multiple ways. For example, if a user asks “how can I get my leaders to buy into the change?”, the assistant would find 5 or 6 elements telling the user something like: “you could use but you would need to . Here are that might help”. they might also ask “how is different from traditional approaches?”

Essentially, what I am building is a replacement for myself so users can ‘ask the change management expert’ and it’ll reply based on my content/books/ideas.

**What I want:

Coaching: While I have build a simple prototype, I am not familiar with the best way to setup Pinecone for this or if vector search is the best way to do this. Right now each ‘element’ is it’s own vector, with meta-data. I am happy to pair, or take advice provided you are a rockstar with semantic search and understand how vector databases can help me create this.

Building/Api: I am reasonably proficient in bubble so I can build this myself, but I want to speed up this project. This is what I would be building:

  • enhancing my existing generic prompt/response capability (this is built now using the assistant’s API and it allows users to generate automatic summaries of various things in my app. ) I want to make this re-usable.

  • token/cost tracking: I have an existing subscription model for users but I need a way to track token usage and cost, and have some type of rate limiting.

  • semantic search (what I described above): if all goes well with the Elements, I would want to extend this to my other content, especially my ‘resources’ section which has a ton of videos, articles and links to helpful things for my community. This may even become the search feature on my website.

  • bonus project: I have a ton of survey data that I’ve uploaded to a CustomGPT in CSV format and I want to replace that so users don’t need a ‘plus’ subscription to use it. This would likely involve using the Assistant API where I can feed the survey data in real-time for my users to analyze.

My budget is $2,000 CAD.

1 Like

As your project seems rather complex and requires a lot of know how besides Bubble, you need an expert in various areas. A good bubble developer charges 80+$ per hour. (Excluding other skills that you also need)

Putting your budget into consideration, unfortunately, I doubt that you will end up with a satisfying result that matches your demands.

Either way, I hope you find a trustworthy person who can build what you need.

I mean, all of this is a 12 hour job if you already know what you’re doing, so I’m sure he’ll be fine. The hard part is the filtering 30 DMs he’s about to receive. Fortunately it’s always easy to filter out 25 of them straight away…

@agiledood just don’t cheap out or you’ll end up spending more and taking more time than paying a little more per hour for a good (but faster) developer :wink: Best of luck!

I think the majority will be coaching and pairing. My prototype already upserts to pinecone and the querying works as well, but the results kinda suck.

I’ve hired folks through this forum before, usually around 80 - 120 per hour which for my budget is around 18-20 hours so with time and cost fixed, scope is the variable.

What sucks specifically? How large are your chunk sizes? How are you splitting them (by number of characters, words, paragraphs?) Are you attaching useful metadata to each vector like the document it comes from?

The best approach I’ve found to integrate Bubble with Pinecone for knowledge query is:

  • convert files to text
  • split text by paragraph - if a chunk is too long, split it by sentence (to best preserve meaning within a chunk). Normally this is 150-words-ish.
  • each Document/File is a Bubble Thing. e.g if you’re upserting a PDF called ‘Introduction’, you’d save Introduction as a Document in the Bubble DB, with its file, its extracted text, and any other useful info.
  • upsert each vector to Pinecone, with the unique ID of the file it comes from as metadata. This means that when it’s returned from the query, you can locate the file it comes from, and display useful info (e.g you could get GPT-4 to cite the document inline (you can see in the screenshot below I include the relevant source at the bottom of the message)

Hey @agiledood ,
Dropped you a message, awaiting your response. :smile:

Hi @agiledood,

We can absolutely help you in finding and matching to the perfect one. We have over 600+ Developers 100+ of whom are Certified Bubble Developers offering their services at different hourly rates depending on their skillsets and years of experiences in Bubble. Kindly check my DM. Thanks.

I think it is the way I’m chunking it. The text is stored in fields, not files. Most of the text fields are small, 50 to 150 words, and the main details of each ‘thing’ is around 400 to 600 words.

Each element has about 10 fields of data, and I’m creating 1 vector per element and using the UniqueID of that thing for the Pinecone document ID. Each element has 4 metadata fields (name, category 1, category 2, url to details page)

I think that’s the best way to setup the vectors because when I add the thousands of resources, I’ll use the same metadata fields to show those resources are linked to those elements.

I think the results suck because I haven’t created a good instruction set for my GPT. I might try that stuff first.

Thanks to those who DM’d and/or replied, I’ve picked a couple people and I don’t need any other replies for this!

1 Like