Hey everyone,
I am currently developing an tool using the OpenAi API to make request. I am trying to reduce costs by using cheaper models while maintaining accuracy and have come accross the cascading (FrugalGPT) that can reduce cost by up to 60% or so.
This basically works by using a cheap model and only when that is overwhelmed going to higher cost models to answer querrys (as of my understanding).
I am not sure how to best implement this in Bubble, does anyone have any ideas how to efficiently do that? I though about just adding a confidence score metric as a tool in my request to GPT and then something like if confidence lower than 3 - go to higher model.
The problem here might be the total time to answer a prompt, as multiple api calls need to be completed in sequence increasing total time needed (bad for user happiness ), and also increasing the WU potentially making it less efficient. Also false confidence as cheap models also have might be a problem in determining the actual confidence to accuracy rating.
Any thoughts or advice on this would be greatly appreciated.