Did you ever want to take a paper document and put it into the database as a pre-defined thing?
This could be a PO, Sales Order, Invoice, Receipt, Bill of Lading … etc.
Thought I would post something I just built as a test prior to developing a feature for my app.
Basically, I wanted to take a paper document and create a thing out of it. In my use case, I created an Invoice from an image of a real invoice including hand-written text.
This uses ChatGPT’s /vision endpoint which you can find here.
This tool inputs a Base64 encoded string of the image and sends it to ChatGPT’s /vision endpoint (see docs above). The API call also includes a prompt to ask ChatGPT to look at the invoice and gather data and put it into a JSON script using parameters which match fields in my database.
The API call returns a JSON script that is then sent to a backend worflow that I exposed as an API endpoint. The backend workflow first creates the header of the invoice and then it loops over the lines (which are imbedded in the JSON payload as an array called invoiceLines). This is done using “run wf on a list”. Finally, I add each line to the header that was just created.
The result is a new invoice. It works nearly flawlessly. Sometimes it gets a few things wrong but can easily be edited. I haven’t played with fine tuning the prompt. I’m tempted to make a plugin out of it!
PLEASE NOTE: The reason the Bill To customer is Denton’s Auto Repair is because the user belongs to that organization. As this is a test, I’m not interested in that part of it because I’m assuming the person uploading the invoice is uploading it because it is one of his company’s suppliers.
The world didn’t wait for ChatGPT to process those document types to return normalized values…
Specialized OCR services such as Azure, AWS or GCP are all available on the plugin marketplace, notwithstanding the cost per page being in magnitude lower.
If the only tool you have is a , it is tempting to treat everything as if it were a nail.
Obviously you have a horse in the race. Well firstly, I never suggested this was the only way to get an invoice in the system. Even quickbooks has that! Only suggesting that it was another way and that this is only step 1 in a number of other accounting processes - like intelligent 3-way match for example.
If we only replied with the standard option in this community, we might not explore new boundaries or learn new things. So please try and be more positive about other people’s work.
The nice thing about being open-minded is that it can lead to solving other problems that were never even considered with the original and constrained approach.
The interesting thing about this method is that a) it is unconstrained - meaning, it can see other things on the page and is not limited to a specific output, and b) it allows you do do all sorts of fine tuning - can customize the recognition and manipulate it using only plain english or any other language and c) you can have it invoke another wf automatically to look something up (like a supplier name or PO number - something that’s useful for an accounting control called “3-way-matching”) - all without even having to do something with a constrained text result and, d) it automatically formats it into JSON (something your method does not do) amongst other things.
How does your solution look up the supplier and decide whether it is already in the db or not? Or how does it know to look up part numbers and ensure that they match? AI can string together multi-agents which can perform other tasks for a complex outcome - like a human might.
Can you post a video of your solution working on the invoice with the hand-written text? I’d be very interested to see it in operation.