I’m building an app in Bubble where users can upload documents (e.g., PDFs, Word files) containing information such as visa details. For example, a visa document might include information like:
First Name
Last Name
Visa Expiry Date
Passport Number
I’d like to extract this data and use it to populate corresponding fields in my database.
What I’ve Tried So Far:
I’ve set up a process where I extract the text from the uploaded document and send it to ChatGPT via an API. ChatGPT converts the text into a structured JSON format like this:
However, I’m not sure how to process this JSON response and use it to create new database entries. Specifically:
Is my method of using ChatGPT API for converting the raw text from the document to JSON a reasonable method?
Once I have the structured data (e.g., JSON), and it is stored in a database field, what’s the best way to map this to database fields and create new entries?
I’m looking for a high-level overview of the best approach to achieve this, including any tools, workflows, or plugins that might simplify the process.
I did a test for data extraction but related to PDF files to extract data from brokerage notes, but I didn’t use chatGPT, I’m not that advanced, I’m just a beginner in bubble, but what I can assure you is that the command : regex will be your partner. Below are two print screens, one with the data that was extracted and each one in its field so that later I can just save them in my database separately in the way I want. And the second print is a way in which I used to extract data to get back from the PDF file the value of : transactions carried out. I hope it gives you some idea.
there are several very good AI/ OCRs now that do this really well
previously I would have used docparser or something similar but now you can find good OCRs on rapidAPI and appify and only pay per each record processed (which works out very cheap usually)
I’d set this up as an api connector as an action step
then I’d run it in the backend workflows (may take some time to complete so you don’t want it on front end). then in backend I’d schedule an api workflow with the results of the OCR (you’d need to initialize the api connector to get the data structure so you could map it into another API to then process it.
if it only returns 1 object and not an array you don’t even need another api and could just create/update a database item from the API response
Yes I am using backend workflows as you suggested @mitchbaylis . @krenkel I couldn’t get the regex approach doing what I needed, it’s not something I’ve used before so probably would need to do some reading up on it. But thanks for the suggestions!
My process is:
Get block of text input from user,
Convert to JSON using chatgpt (works ok with enough time spent on prompt)
Use JSON Manipulator plugin to extract the data from the JSON and insert into database fields
The last step I am still having problems with but I will start a new thread with a more specific question on it.