API searching bubble database

Has anybody used a OpenAI, Anthropic or other AI to search a large bubble database and it not cost a fortune? If so, how?

I have a few thousand pages of content I want the API to search through along with using its other capabilities. Be great if someone has experience doing this.

Pages as in PDF pages or do you mean just many records across many tables in your database?

Sorry for not being clear, I meant webpages like blog posts. They could also be put into a huge pdf if required, if that is easier

you’d want to create a search index for this

this is how much ai “search tools” and how google search works

basically a crawler visits webpages and extracts useful keywords and then put them into an index

this way the search content is much shorter - like 1% of the original size. and the search happens very quickly and cheaply.

depending on the size of the data to be searched through and its complexity you may also need to bucket it and do the search in a few steps.

  1. find the relevant bucket
  2. search the items within the bucket

and potentially a few layers of buckets.

** you’d only need to do this bucketing process if you had more data than the ai was limited to

alternatively just use a search tool that handles this whole process for you like algolia…

for a pdf you’d need to use an ocr to extract the content first, then index it

Yes, I’ll bucket the data for different queries.

How would you suggest running the workflow?

What I’m thinking is, searching the database and the results get displayed on a different page which gets scraped/crawled, marked down and then fed into the AI. I’m struggling with figuring out how:

  1. Information put into the multiline input can be used to display data on the page that will be crawled.
  2. I do this without moving between pages.

I basically need this to all happen in the background.

well if you intend to build a custom searchable index then chatgpt api will be very useful

  1. user adds text or uploads pdf
  2. ocr reads pdf/uploads and extracts text
  3. chatgpt then summarizes text, extracts keywords and name and creates an index item
  4. user searches index items and relevant index items are returned, user opens each item to see the detail

algolia is a great search tool and will give you lightning fast results that can also be fuzzy/loosely relevant to the content

by splitting the data in 2 - index and detail - you essentially create a very lightweight searchable index that then links to the relevant detail items

Thanks for the suggestion, but I don’t think I explained what I’m building well.

I want to feed a chunk of the data into the AI api to help the AI to answer the query of the user. But I want the search function I’m building to choose what data gets pulled from the database and fed into the api.

I plan on having specific designed tags labelling all the data.

The data will be there to assist the AI but the AI will also worked as normal. I’ll have the data already in markdown etc. I don’t want the user to access the information directly.