Hi everyone,
I’m looking for an API to which I can send any document and it returns the extracted content. The formats I need to support include:
- Text and Markdown Files: .TXT, .MD
- Text Documents: .DOC, .DOCX, .ODT, .OTT, .RTF
- Presentations: .PPTX, .POTX, .ODP, .OTP
- Spreadsheets: .XLS, .XLSX, .XLSB, .XLSM, .XLTX, .CSV, .ODS, .OTS
- HTML and XML Files: .HTML, .HTM, .ATOM, .RSS, .XML
- PDF: .PDF
- Diagrams and Graphics: .ODG, .OTG
Does anyone have a recommendation for such an API that can handle these formats and return the extracted content?
Thanks in advance!
Zeroic
2
Hi! You can use OpenAI’s Assistants API to do this
1 Like
Zeroic
4
- Upload your files to the vector store
- Create an assistant with the prompt to give you the extracted content from the document
- Create a thread and pass the message to it
- Create a run and pass the assistant ID to it
You should get the result
1 Like
Very interesting, thanks!
I should be able to test this in open-ai’s playground to see if this works, right?
system
Closed
7
This topic was automatically closed after 70 days. New replies are no longer allowed.