I have some small question regarding extraction data from PDF / word DOC
Especially, in my app I have a continuous stream of PDF / DOC documents. I need to extract specific data from these PDF’s (DOC’s) and save this data to database. Documents have a single fields structure.
I would like to find out if there is any suitable plugins available (or any free API’s) for this functionality.
Hi - try azure vision. They have an excellent ocr and you can easily connect an api from bubble to azure.
Thanks. Will read about this API.
But now I have got s new question, esp If OCR is the best tool for this usecase?
I mean - OCR is inevitable for extraction text from images. But in my case we have structured documents (esp - it is specific tables in word / PDF) . It seems that it is much easier to extract data from Word table than from Images / OCR