The application I am building requires producing accurate wordcounts of uploaded word document and pdf files in multiple languages.
Currently I am only aware of one plugin by zerocode which I purchased a while back. Their plugin revealed the following 2 big flaws:
- It doesn’t read PDF wordcounts accurately due to extra characters or something being counted in the PDF. I have tested this with the same text in Word showing the correct wordcount of 100 words whereas the PDF copy of the same document would produce a wordcount of 500 words.
- Even with uploaded Word documents, it does not show the proper wordcount of languages that use different scripts such as Arabic and Russian. These results are way off (shows 1 word for an Arabic Word document containing 65 words for example).
Having searched, I found no other plugins that could provide me with what I need. I am not sure if any of the OCR plugins by @redvivi could give me exactly what I need?
If not, can anyone suggest the best way to go about achieving this i.e: an (closest to) accurate wordcount retriever of both .docx and .pdf uploaded files in all or at least most language scripts?
Thanks in advance