📰 ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ AWS Textract - OCR Text & Data [Now with Queries support & Automated AWS Environment Setup]

redvivi · September 9, 2020, 4:25pm

Hi Bubblers !

With this plugin, you can automatically extracts text and data, and structure from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.

WARNING: This service provides OCR-specialised operations based on a document as input. If you intend to detect text in an image such as a scene, please refer to the AWS Rekognition - Text Recognition plugin.

You can test out our AWS Textract - OCR Text & Data plugin with the live demo.

Enjoy !
Made with by wise:able
Discover our other Artificial Intelligence-based Plugins

solutomn · February 24, 2021, 9:57am

Hi @redvivi ,
Could you show me a PDF example?
Thanks!
@solutomn
André

redvivi · February 24, 2021, 10:30am

Hi @solutomn,

AWS Textract service works in synchronous mode for PNG or JPEG files only, e.g. a request is sent and the response comes right away within the same action.
It is possible to use PDF files in asynchronous mode (a request is sent, processed by AWS, and the response is retrieved after the requestor checks the job completion status), requiring you own an AWS S3 storage, use AWS Queue and Notifications to get notified, which adds a layer of complexity.
You can find an example of an asynchronous AWS request for another plugin in this editor. Should you require this implementation, we would be happy to investigate this possibility for you.

solutomn · February 25, 2021, 5:13pm

@redvivi
I actually need users of my web application to be able to upload a PDF to my system and all the data in that PDF can be saved in the database as attributes of one or more tables.
Is that possible?
Thanks

redvivi · February 25, 2021, 5:31pm

Hi @solutomn,

It is of course possible using both solutions in my previous response:

If you must use Amazon Web Services please let us know, we would be happy to modify our plugin to do so, but keep in mind that AWS has a slightly more complex setup.

solutomn · February 25, 2021, 5:35pm

excellent!
And how much do you charge to do that?
And how can I pay you. I’m in the city of Rio de Janeiro, Brazil.

redvivi · February 25, 2021, 5:38pm

Should you want to use Amazon Web Services, please reach to us directly via DM, we would be happy to customise our existing AWS Textract - OCR Text & Data plugin for you.

solutomn · February 25, 2021, 5:42pm

Is the plugin easy to configure and use?
Do you have any step by step?

redvivi · February 25, 2021, 5:47pm

You can see an example of such similar implementation here, along with the step by steps instructions described there.

For further assistance or customisation request, please reach us directly via DM/private message.

solutomn · February 25, 2021, 5:50pm

I’m sorry, it’s because I’m not a programmer and some resources are complex for me.
And one doubt: what is mean DMs? Means e-mail message?

redvivi · February 25, 2021, 5:57pm

Direct Message. Just sent you one.

Please check your private message inbox on this forum.

redvivi · March 1, 2021, 8:53am

Hey guys !

Just to let you know that we have updated the details in the plugin response and introduced PDF support, along with asynchronous requests, so you can build a comprehensive document structure, as showcased in our demo:

Enjoy !

redvivi · March 17, 2022, 10:24pm

Oh by the way, our demo demonstrates now how to process forms and tables OCR response in Bubble, especially mapping each value with its key, which is notoriously difficult as AWS Textract is quite convoluted.

redvivi · May 11, 2022, 5:48pm

And now with queries support using AI

redvivi · November 3, 2022, 6:34pm

Hello Bubblers!

Just to let you know that this plugin has been updated to provide an automated script to configure your AWS environment .

Enjoy!

Hemanth · March 8, 2023, 3:59am

Hi @redvivi ,

Can we extract data from multi-page pdf (like 100 pages) synchronously?

Sent you a DM. Please check.

redvivi · March 9, 2023, 8:12am

Hey @Hemanth ,

See the service limits of AWS Textract for more information

https://docs.aws.amazon.com/textract/latest/dg/limits.html

aestela · September 27, 2023, 4:32pm

@redvivi Is there a way to filter/query results using bubble filters on the sync module? We are trying to analyze documents from different countries and need to find specific elements within the documents, and the only “logical” way we can think about this is to filter out the results by comparing them to the expected Entries.

Have you ever tested something like this?

redvivi · September 27, 2023, 4:56pm

Hi @aestela !

I would suggest to refer to the demo editor, refer to the element named “Example of first form’s value extraction” and the associated filters to match from an existing value.

You may find additional information on Lines and Words of Text - Amazon Textract

Should you wish to explore the Textract’s “Query” feature on synchronous operation, it is not supported yet but happy to have a look at your custom use-case.

Topic		Replies	Views
:receipt: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ AWS Textract - AI Invoice & Receipt OCR [Now with Automated AWS Environment Setup!] Showcase	2	1290	November 25, 2022
[New Plugin] - Free OCR - Extract text from your PDF's, JPG's or PNG's Implemented	28	13145	January 26, 2022
:clipboard: ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ Google Document AI - Form OCR [incl. EU region support & Automated Google Environment Setup!] Plugins	4	841	March 18, 2024
Saving PDF data as a database thing Need help	4	581	April 14, 2021
Data from google document AI Database	2	433	April 17, 2023

📰 ᴺᴱᵂ ᴾᴸᵁᴳᴵᴺ AWS Textract - OCR Text & Data [Now with Queries support & Automated AWS Environment Setup]

Related topics