📰 ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ AWS Textract - OCR Text & Data [Now with Queries support & Automated AWS Environment Setup]

Hi Bubblers !

With this plugin, you can automatically extracts text and data, and structure from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.

:warning: WARNING: This service provides OCR-specialised operations based on a document as input. If you intend to detect text in an image such as a scene, please refer to the AWS Rekognition - Text Recognition plugin.

You can test out our AWS Textract - OCR Text & Data plugin with the live demo.

Enjoy !
Made with :black_heart: by wise:able
Discover our other Artificial Intelligence-based Plugins

5 Likes

Hi @redvivi ,
Could you show me a PDF example?
Thanks!
@solutomn
André

Hi @solutomn,

  • AWS Textract service works in synchronous mode for PNG or JPEG files only, e.g. a request is sent and the response comes right away within the same action.

  • It is possible to use PDF files in asynchronous mode (a request is sent, processed by AWS, and the response is retrieved after the requestor checks the job completion status), requiring you own an AWS S3 storage, use AWS Queue and Notifications to get notified, which adds a layer of complexity.

  • You can find an example of an asynchronous AWS request for another plugin in this editor. Should you require this implementation, we would be happy to investigate this possibility for you.

@redvivi
I actually need users of my web application to be able to upload a PDF to my system and all the data in that PDF can be saved in the database as attributes of one or more tables.
Is that possible?
Thanks

Hi @solutomn,

It is of course possible using both solutions in my previous response:

If you must use Amazon Web Services please let us know, we would be happy to modify our plugin to do so, but keep in mind that AWS has a slightly more complex setup.

excellent!
And how much do you charge to do that?
And how can I pay you. I’m in the city of Rio de Janeiro, Brazil.

Should you want to use Amazon Web Services, please reach to us directly via DM, we would be happy to customise our existing AWS Textract - OCR Text & Data plugin for you.

Is the plugin easy to configure and use?
Do you have any step by step?

You can see an example of such similar implementation here, along with the step by steps instructions described there.

For further assistance or customisation request, please reach us directly via DM/private message.

I’m sorry, it’s because I’m not a programmer and some resources are complex for me.
And one doubt: what is mean DMs? Means e-mail message?

Direct Message. Just sent you one.

Please check your private message inbox on this forum.

Hey guys !

Just to let you know that we have updated the details in the plugin response and introduced PDF support, along with asynchronous requests, so you can build a comprehensive document structure, as showcased in our demo:

Enjoy !

Oh by the way, our demo demonstrates now how to process forms and tables OCR response in Bubble, especially mapping each value with its key, which is notoriously difficult as AWS Textract is quite convoluted.

And now with queries support using AI :slight_smile:
image

1 Like

Hello Bubblers!

Just to let you know that this plugin has been updated to provide an automated script to configure your AWS environment :man_mechanic:t3:.

Enjoy!

Hi @redvivi ,

Can we extract data from multi-page pdf (like 100 pages) synchronously?

Sent you a DM. Please check.

Hey @Hemanth ,

See the service limits of AWS Textract for more information

https://docs.aws.amazon.com/textract/latest/dg/limits.html

@redvivi Is there a way to filter/query results using bubble filters on the sync module? We are trying to analyze documents from different countries and need to find specific elements within the documents, and the only “logical” way we can think about this is to filter out the results by comparing them to the expected Entries.

Have you ever tested something like this?

Hi @aestela !

I would suggest to refer to the demo editor, refer to the element named “Example of first form’s value extraction” and the associated filters to match from an existing value.

You may find additional information on Lines and Words of Text - Amazon Textract

Should you wish to explore the Textract’s “Query” feature on synchronous operation, it is not supported yet but happy to have a look at your custom use-case.