I have developed a lot of functions that allow virtually any useful file type to be thrown at it and it’ll return the text extracted (PDF, MS Office, TXT, JSON, HTML, website URL, audio (will generate speech transcription), Youtube video, CSV, etc…).
Thinking about releasing it as a simple API as I can’t find any current options other than setting up multiple APIs to handle each file type. All you’d do is send the file URL in your API call and it’ll return the extracted text synchronously or via webhook if you prefer.
This would of course be targetted at developers (and particularly Bubble developers building integrations with LLMs that require text to be extracted from files). Example features you could build:
- get file text for use in AI LLM like ChatGPT
- enable text search through documents stored on your app
- get file text for use cases like upload invoice → automatically extract and classify things
Anyone interested?