Convert any file type to text for LLMs

I have developed a lot of functions that allow virtually any useful file type to be thrown at it and it’ll return the text extracted (PDF, MS Office, TXT, JSON, HTML, website URL, audio (will generate speech transcription), Youtube video, CSV, etc…).

Thinking about releasing it as a simple API as I can’t find any current options other than setting up multiple APIs to handle each file type. All you’d do is send the file URL in your API call and it’ll return the extracted text synchronously or via webhook if you prefer.

This would of course be targetted at developers (and particularly Bubble developers building integrations with LLMs that require text to be extracted from files). Example features you could build:

  • get file text for use in AI LLM like ChatGPT
  • enable text search through documents stored on your app
  • get file text for use cases like upload invoice → automatically extract and classify things

Anyone interested?

4 Likes

Im interested but can you explain how it works, give me some use cases where you would use it or think it could use some value ? Especially when the output is something like the JSON / HTML what you give it as input?