Any idea how to break large pdfs into chunks for Open AI 's davinci model?

sellporthq · March 30, 2023, 7:54am

so I’m working on this project where I have to train chatgpt on pdf data via APIs and get it ready for questioning by the users, This means a user uploads a pdf to ask the AI questions about the pdf. The problem is the model only allows about 4000 tokens hence causing a prompt limit issue, do you have any idea on how I can solve this, for instance breaking the pdf into chunks, etc…

J805 · March 30, 2023, 4:36pm

Hmm Good question.

I’m assuming that you have already figured out a way to extract the text from the PDF (Probably using OCR).

You can take the text that you extract and split it up into parts using regex. Then each chunk of data you could send to chatgpt and ask the same question using a backend workflow that runs asynchronously. Then maybe ask chatgpt which answer is better each time? Either stick with the answer or move on to the next chunk of text until you go through all of the text.

Or… you can have chatgpt summarize the chunks and then ask the question at the end. Might be missing some data that it might need to answer the question at that point though.

Would something like that work? Just brainstorming here.

sellporthq · March 30, 2023, 4:39pm

like you read my mind!.. exactly what I’ve been thinking!! thanks for your input! that even cements it the more

system · June 8, 2023, 7:54am

This topic was automatically closed after 70 days. New replies are no longer allowed.

Topic		Replies	Views
Any idea how to break large documents (10K+ words) into chunks for Open AI ‘s 3.5 turbo model? Need help	3	913	August 17, 2023
Summary the content of PDF with chatGPT, OpenAI API Need help	2	694	July 26, 2023
Short-term developer needed for AI chatbot enhancements Jobs / Freelance	3	637	March 7, 2024
AI Based Chatbot Trained on PDF Files Need help	3	54	April 21, 2025
Autofill Resume Datas Need help	2	30	January 28, 2025

Any idea how to break large pdfs into chunks for Open AI 's davinci model?

Related topics