Forum Academy Marketplace Showcase Pricing Features

📰 [New Plugin] AWS Textract - OCR Text & Data [Now with PDF support]

Hi Bubblers !

With this plugin, you can automatically extracts text and data, and structure from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.

:warning: WARNING: This service provides OCR-specialised operations based on a document as input. If you intend to detect text in an image such as a scene, please refer to the AWS Rekognition - Text Recognition plugin.

You can test out our AWS Textract - OCR Text & Data plugin with the live demo.

Enjoy !
Made with :black_heart: by wise:able
Discover our other Artificial Intelligence-based Plugins


Hi @redvivi ,
Could you show me a PDF example?

Hi @solutomn,

  • AWS Textract service works in synchronous mode for PNG or JPEG files only, e.g. a request is sent and the response comes right away within the same action.

  • It is possible to use PDF files in asynchronous mode (a request is sent, processed by AWS, and the response is retrieved after the requestor checks the job completion status), requiring you own an AWS S3 storage, use AWS Queue and Notifications to get notified, which adds a layer of complexity.

  • You can find an example of an asynchronous AWS request for another plugin in this editor. Should you require this implementation, we would be happy to investigate this possibility for you.

I actually need users of my web application to be able to upload a PDF to my system and all the data in that PDF can be saved in the database as attributes of one or more tables.
Is that possible?

Hi @solutomn,

It is of course possible using both solutions in my previous response:

If you must use Amazon Web Services please let us know, we would be happy to modify our plugin to do so, but keep in mind that AWS has a slightly more complex setup.

And how much do you charge to do that?
And how can I pay you. I’m in the city of Rio de Janeiro, Brazil.

Should you want to use Amazon Web Services, please reach to us directly via DM, we would be happy to customise our existing AWS Textract - OCR Text & Data plugin for you.

Is the plugin easy to configure and use?
Do you have any step by step?

You can see an example of such similar implementation here, along with the step by steps instructions described there.

For further assistance or customisation request, please reach us directly via DM/private message.

I’m sorry, it’s because I’m not a programmer and some resources are complex for me.
And one doubt: what is mean DMs? Means e-mail message?

Direct Message. Just sent you one.

Please check your private message inbox on this forum.

Hey guys !

Just to let you know that we have updated the details in the plugin response and introduced PDF support, along with asynchronous requests, so you can build a comprehensive document structure, as showcased in our demo:

Enjoy !