Good articles on ChatGPT & Large document

stuart4 · May 5, 2023, 10:11pm

I thought I would share these because they were very well written and would be useful for those trying to understand why large documents can’t be used directly in gtp3, 4 as well as vector databases, and data scraping large documents.

Here is an excerpt:

The problem is that these models have limited context. You can only fit in a few thousands words into them at a time. If you need to squeeze in more, tough luck! You’ve either got to fine-tune the model by training it on the data (which you can’t do with ChatGPT, only the older GPT-3 model) or, what is usually the better option, you need to extract only the relevant text for your specific prompt.

To give an example: If you want to have an answer to the question “What’s the best way to grill a steak?” you can’t just feed ChatGPT an entire cookbook all at once. These models can only understand 3000-6000 words at once, so you’ve got to be clever about which content you feed it.

Of course, a naive approach would be to just run a regex for “steak” and include those pages, but you’ll miss out on all the pages that talk about “beef”, “red meat”, “burgers”, and many other similarly related concepts. It’s easy as a human reading through the index to relate these items, but it’s very difficult to write a program to exhaustively map every related concept (without a significant time investment).

And that’s where embeddings come in. They’re a clever way to slice up the cookbook into smaller, more manageable chunks that can be fed into the limited context of a language model like ChatGPT.

There are also some other good links in the articles.

Topic		Replies	Views
Any idea how to break large pdfs into chunks for Open AI 's davinci model? Need help	3	821	June 8, 2023
FlexGPT - ChatGPT with memory, web search, unlimited GPT-4, no subscriptions Showcase	39	11129	May 20, 2023
Any idea how to break large documents (10K+ words) into chunks for Open AI ‘s 3.5 turbo model? Need help	3	905	August 17, 2023
[Free Template] ChatGPT + Browser + Vectorstore Templates	1	446	August 17, 2023
ChatGPT + Document Understanding (LangChain) \| FREE TEMPLATE Templates	0	1016	June 15, 2023

Good articles on ChatGPT & Large document

Related topics