OpenAI / Conversation with File Upload

This is more of a journal entry but I don’t wish my suffering on anyone else so I figured I’d pass along what I’ve learned with this side project….with hope of any feedback or ways to improve!

So there are already a handful of methods to have a conversation with a file. From the material and videos I found, a lot of them steered towards using a RAG or memory database. But I had trouble researching how to keep a conversation, the memory, and responses tied to one specific document as I didn’t want responses infected with knowledge or material from other documents. The information I found was also targeting easy PDF files and more of a “Single Question” rather than a full conversation with beefy financial / excel files like mine.

I originally tried using OpenAI’s threading mechanic, where a workflow would upload the file, create a thread, and send the file + message to the assistant. However, this created many frustrating and messy workflows with false-safe protocols (to protect triggering an API call when the previous one wasn’t finished)

The main issues with this method was that the assistant had trouble processing the file, and often couldn’t properly read the files - so it would revert to hallucinating responses. Even with my failsafes exhausted, it was still too unpredictable. I also played with extracting text from the files rather than uploading; but my application primarily deals with financial tables/models and these almost always broke. While just vectoring instead of sending the file could work, these too were breaking my file structures for my tables + excel files.

Well, I finally found a working solution!

I can now upload a file and question, get a response, and then my follow up questions + conversational memory stays relevant. I was also able to configure text streaming for all follow up responses. By pairing a file upload with a vector, OpenAI essentially holds them both up next to each other when reasoning.

The workflow:

  1. Create the conversation / message
  2. Upload File to OpenAI (API Call) (Returns a file_ID)
  3. Create a new Vector Store (API Call) (Returns a vector_score_id)
  4. Upload File (Step 2) to Vector Store (Step 3) (API Call)
  5. Schedule an API Workflow (Send Conversation, vector ID, Initial Question, and scheduled for current time +4 seconds)

The backend workflow was the real kicker. (A rough illustration below before I try to explain it)

Because the file sizes will always vary, I wanted to poll/get my responses as soon as possible while having the flexibility for all varieties of sizes. So, in the conversation that we send to the backend workflow I store a few values:

  1. the “Loop Count” (Number)
  2. the “Poll State” (Text)

The loop count is a ticker that counts how many times the loop is ran; as well as a safety net to protect an endless loop. The first step of the workflow ensures that the loop count hasn’t been ran more than 5-10 times (can vary)(red rectangle on image). The poll state stores the status of the poll. If we try to ask our question before the file is uploaded, it can break the workflow or just respond “I don’t see a file.”

If we poll and the file isn’t uploaded / vectored completely:

  • set conversation loop count + 1 // poll state to “Not Done”
  • schedule backend workflow again in +2 seconds
  • terminate this workflow

If we poll, and the file is completed, then this allows the “Ask Vector Store” to fire.

Now, if the response from the API call detects language like “I didnt see a file, no file, invalid, etc” then this triggers the next loop. This also helps “refresh” as sometimes complex documents get confusing.

  • set conversation loop count +1 // poll state to “Invalid”
  • schedule backend workflow again in +2 seconds
  • terminate this workflow

Now, if we poll the response and it is a clean, valid response, then the cycle is ended and then it creates the response like normal and adds it to the conversation. At the end, the conversation’s poll state + loop count is reset.

For follow up messages, everything can be done in the front end and we only need to reference the “Ask Vector” API call + stream the response like normal. The memory is referenced in the prompt, and I may pivot to doing this inside a thread instead.

While I may have skipped over some parts; I found this method extremely strong and cost effective. My previous method with Claude was slower and more expensive, and it wasn’t until I understood more about OpenAI’s new updates that really helped me. I discovered in their forums that while the traditional upload + assistant can work for most documents, it was really challenging for big excel files + tables. Apparently, if you send both a vector and the file together at once, it’s easier to look at them side by side and respond more accurately.

I’m still getting familiar and more aware of my surroundings in this space but I hope this can help some others!