Hi everyone,

I’m looking for an API to which I can send any document and it returns the extracted content. The formats I need to support include:

  • Text and Markdown Files: .TXT, .MD
  • Text Documents: .DOC, .DOCX, .ODT, .OTT, .RTF
  • Presentations: .PPTX, .POTX, .ODP, .OTP
  • Spreadsheets: .XLS, .XLSX, .XLSB, .XLSM, .XLTX, .CSV, .ODS, .OTS
  • HTML and XML Files: .HTML, .HTM, .ATOM, .RSS, .XML
  • PDF: .PDF
  • Diagrams and Graphics: .ODG, .OTG

Does anyone have a recommendation for such an API that can handle these formats and return the extracted content?

Thanks in advance!

Hi! You can use OpenAI’s Assistants API to do this

1 Like

Can you ellaborate?

  1. Upload your files to the vector store
  2. Create an assistant with the prompt to give you the extracted content from the document
  3. Create a thread and pass the message to it
  4. Create a run and pass the assistant ID to it

You should get the result

1 Like

Very interesting, thanks!
I should be able to test this in open-ai’s playground to see if this works, right?

Yes :slight_smile:

1 Like

This topic was automatically closed after 70 days. New replies are no longer allowed.