PDF data extractor

a.v.krikunenko · September 28, 2019, 7:05pm

Dear Bubblers
I have some small question regarding extraction data from PDF / word DOC
Especially, in my app I have a continuous stream of PDF / DOC documents. I need to extract specific data from these PDF’s (DOC’s) and save this data to database. Documents have a single fields structure.
I would like to find out if there is any suitable plugins available (or any free API’s) for this functionality.

BR,
Alexander

gilles · September 28, 2019, 7:13pm

Hi - try azure vision. They have an excellent ocr and you can easily connect an api from bubble to azure.

a.v.krikunenko · September 28, 2019, 7:44pm

Thanks. Will read about this API.
But now I have got s new question, esp If OCR is the best tool for this usecase?
I mean - OCR is inevitable for extraction text from images. But in my case we have structured documents (esp - it is specific tables in word / PDF) . It seems that it is much easier to extract data from Word table than from Images / OCR

Topic		Replies	Views
Data extraction from an image or PDF Plugins	5	1993	December 10, 2024
Is there a way to extract data from a standardized pdf? APIs	3	808	October 14, 2021
Image / PDF reading software Need help	4	831	October 22, 2022
Parsing data from PDF and Word Questions	6	697	August 3, 2024
[PLUGIN] DocParser API Plugins	0	374	December 23, 2022

PDF data extractor

Related topics