Hey folks,
If you’re using OpenAI + Pinecone to generate embeddings and store vectors, this API might be of use to you. Prior to creating this API, my Bubble workflow was
- Convert the file into text
- Split the text into chunks of n words
- Run a recursive workflow to get embeddings for each chunk of text in batches of 100
- Upsert each batch of embeddings into the Pinecone database
- Go to the next batch
This used HEAPS of WUs and was really slow. I’ve just created a simple API that will handle it for you and I’m putting it here for free (again, this is a ‘take it or leave it’ thing and I’m making no guarantees as to its reliability). The files are here: https://www.dropbox.com/scl/fo/hojftof2g9tyx7y8eh9ew/h?rlkey=2hd0yay77rbzp26j85sx8i9bo&dl=0
To test, I was able to vectorise and upsert an entire Harry Potter book (80,000 words) in <30 seconds. Previously that would’ve been like half an hour in my old workflows…
If you would like implementation support (uploading to Google Cloud Platform, integrating into a Bubble app) then I’m available to help with that at a fixed fee, but PM me for more info.
Guide
API request from Bubble:
{
"content": <content>, # the content you want to embed
"wordLimit": <wordLimit>, # the chunk size in number of words (e.g 150 will split the text into vectors of 150 words each)
"uniqueID": <uniqueID>, # a unique ID for this set of vectors/this request that can be used to reconcile it with your Bubble DB
"pineconeURL": <pineconeURL>, # URL of your Pinecone index (https://indexurl.pinecone.io/) - don't include any /vectors or /upsert path suffixes
"pineconeAPIkey": <pineconeAPIkey>, # Pinecone API key
"openAIAPIkey": <openAIAPIkey>, # OpenAI API key (sk-xxxxxxx)
"webhookURL": <webhookURL>, # Bubble webhook URL to return progress updates to
"namespace": <namespace>, # namespace to upload to
"category": <categoryID> # each vector gets a meta value that you can set here. I use it for categorisation of vectors for querying later.
}
While the embeddings + upserting runs, it will send progress updates to the webhook URL after the completion of each batch (100 by default)
{
"processed": 1, # the number of vectors processed in this batch
"total": 1, # the number of vectors in the request in total
"uniqueID": <uniqueID>, # the uniqueID you specified in the initial request
"uniqueIDs": [<id1>, <id2> etc] # list of unique IDs for each vector. You need to save these somewhere in Bubble as you use a vector's unique ID to delete it
}
I create a vectorGroup data type in Bubble for each request. A vectorGroup is what I use to describe one memory upload (e.g one document or one text). On the vectorGroup I store its name, upload status, list of vector IDs etc. When the webhook comes in, I change a field ‘processing’ on the vectorGroup and when the number of processed vectors = the total number of vectors I change the status to complete.
Testing
You can test it at https://flexgpt.io, there’s a little bit of free credit to try uploading with.
Notes
- I haven’t included any auth, so anyone that can find out the URL you’re hosting the API could technically it for their own app (although you could probably see their API keys and data so that’d be dumb…)
- ChatGPT and I aren’t perfect Python developers, this probably isn’t the most efficient script in the world and you might be able to improve it.
- Batch size is 100 by default as Pinecone recommends batches of no more than 100
- Make sure the wordLimit isn’t too high as OpenAI can only handle 8191 tokens per embedding). 150 words is a pretty good balance between not being too huge and still giving enough context for the LLM.
- The number of tokens returned is actually number of words in the content