My app has recently started getting loads of Workflow error - Database query timeout responses that cause a workflow to fail. Can someone tell me how to fix this specific example?
The table in question currently has about 600k entries. I just want it to fetch any single entry that matches a single WHERE clause:
I think this means that as soon as the database finds a single entry that matches that criteria (and for context, there are currently 1.4k) it should return just that entry and use it. Unless I’m mistaken, that appears to be the simplest way to find any single entry that matches my criteria?
The overall task is to find any Post in the database and hand it to another backend workflow where its related entries in other tables are deleted before then deleting the original Post itself.
What am I doing wrong here? I have loads of workflows that have recently started failing due to database query timeouts. As this was so simple I thought I could fix this and move onto the more difficult ones but even this appears unfixable!
The workflow is failing immediately as it begins and this is step 1 so I’m sure this is the step that it’s failing on.
@alan8 Thanks for your reply. Team is another database table, so in this instance it’s a dynamic link to a specific entry in that table.
The reason why my query isn’t more specific is because I want it to iterate through all posts for one Team until they are all deleted.
Am I right in my assumption about how the database query is working? Simply look until you find any single entry and give it to me? I can’t see why that would timeout a database, regardless of the size of that table or the number of matching entries!
I could add an extra iteration loop that searches for posts in a 24 hour time period, iterating back by 24 hours every time the current period’s count of Posts is zero but that seems like unnecessary complexity in this scenario.
What are these ‘hard limits’ you are referring to? Is there some bubble documentation somewhere that defines database limits that could cause timeouts?
OK so the first thing is the hard limits are 50K. I reckon you have probably had your database grow since you first wrote the query and this is why you are now triggering time outs.
See more here:
The second, is that to me your query looks really inefficient. You are effectively doing something like a cartesian link in the database. You are effectively saying to the table - OK, I want to search you by first throwing a bunch of yourself at yourself and then once you’ve worked out which one of you I want, I want to you to query you by that. And then send the results back to me.
The search criteria (or the index) is effectively a subset of itself. So the database has got to do a lot of searching just to work out what you what to narrow down the search by, before it performs the query.
If I were doing the query and I were looking at finding the Posts in the post table made by a particular team, I would structure the database differently.
Firstly I would have all the Posts indexed by the Team ID. And then I would ask the Posts table to find only the posts that related to that Team ID.
So something like: Do search of Team where Team_ID = Team.Unique_ID.
It will give just the records you need, which should be way less than 50k records and then you can manipulate the result set from there.
What do you mean by indexed by… we dont control indexes in Bubble. Also, if i inderstand your post correctly, there’s no reason to link by ID as opposed to the actual thing.
@david.j.hansford something is off in your query as you can search many millions without getting a timeout. If you are looping thru how are you making sure not to select the same post on the next run?
Also what’s happening in the next step? Is there a timeout on the first run ever or only when subsequent steps from previous runs are still in progress?
Finally why the single record? Run the same api wf on a list for all posts by that team??
@code-escapee We don’t directly set the indexed but it makes good sense to the use the ones that bubble gives us.
The record’s (thing’s) unique ID is indexed internally by Bubble. It is the single most efficient means of obtaining a reference to the row or group of rows in a table.
Additionally, Bubble, optimised for queries that are run regularly by apps. The more than query is narrowed down the more efficient the caching and the faster your app (and the less WU you consume).
For me, using an object to filter down the retrieval of an object, or group of objects is fuzzy and inefficient.
And most probably the root cause of @david.j.hansford timeouts. And as you have spotted, he is probably requerying data already quieried.
The mantra for data retrieval is to be a specific as you can about what data you need/want and then work with that smaller data set.
If this is your main task, then I would suggest using Triggers on the related tables to delete whatever related Post entries without doing any searches.
@alan8
It’s possible you’re about to explain something to me about how Bubble queries work that will make my searches way more efficient.
Here’s what I’m doing along with my understanding of why:
My Posts table has a field called Team which is of type Team - in other words, a direct link to the entry in the Team table which corresponds to the origin of the post in question. (see image taken from the Posts database fields page.)
The entire workflow is just 2 steps: (1) Find a single Post with Team matching the provided value and send it to a second workflow that deletes other things in other tables related to this Post before deleting the Post itself. (2) Schedule to repeat the workflow after 2s pause (to allow for the other workflow to complete) as long as there are more Posts to delete
So I have 2 questions following on from your post:
Is saving Team of type Team against every entry in the Posts table not essentially saving the Team’s unique ID?
Is passing a Team of type Team in a database search therefore not the same as a WHERE clause that requires the Post to have that Team’s UUID? I do all my searches this way because I assumed that was the case and it would therefore be well indexed…
Firstly I would have all the Posts indexed by the Team ID. And then I would ask the Posts table to find only the posts that related to that Team ID.
So something like: Do search of Team where Team_ID = Team.Unique_ID.
If this is a significant improvement in terms of indexing and database efficiency then I may have to consider a database migration, though it would be a huge headache to do that this late in the game. The app has been running for over 2 years with loads of customer data.
The mantra for data retrieval is to be a specific as you can about what data you need/want and then work with that smaller data set.
Putting the indexing question above aside, I don’t see how I can be more specific in this particular search. The one thing that differentiates the Posts I’m looking for from every other Post in the table is the Team associated with it…
If you are looping thru how are you making sure not to select the same post on the next run?
Also what’s happening in the next step? Is there a timeout on the first run ever or only when subsequent steps from previous runs are still in progress?
Simply by scheduling the next wf for 2s in the future which is long enough for the post to have been deleted. It’s possible this logic is flawed but the timeout happens on step 1 of the first loop so this isn’t causing the issue.
In fact, when triggering this workflow does work, it functions as I expect it to.
Finally why the single record? Run the same api wf on a list for all posts by that team??
I’ve been burned by that before! Some teams can have >50k Posts in my database when I need to delete them. Bubble always used to fall over with big lists so I converted to doing loops instead. Perhaps they’ve updated to avoid this issue, but I’m still on the old payment plan which I believe has limited headroom when working with lists. (I always see a prompt to move onto their new payment plans because they say they can handle working on lists better…)
If this is your main task, then I would suggest using Triggers on the related tables to delete whatever related Post entries without doing any searches.
I don’t understand your suggestion. What would trigger the trigger? This is a manual task I currently trigger when necessary from a button click on an admin page.
I think you’re correct. The simplest way is Search:first item, though.
Other than that, I cannot see you doing anything wrong. Unless, you Post is HUGE because you’ve been storing images in it in base64 (check the content of the post in the data tab to find out…)
Not sure what @alan8 is on about - there’s no loops here. Add some more constraints and that might optimise your search, but I’d bet the issue is huge data on each thing!
Here’s a separate (but related) question:
What’s the most efficient way of asking the database if there are any more entries that match my criteria in a table? I used to do Do a search for X :count >0
However I assume that isn’t very efficient. As a result I am currently trying:
Do a search for X :items until #1 :count >0
My reasoning is that hopefully this stops searching as soon as it finds any entry that matches the criteria and returns with it immediately. Knowing there is at least 1 is enough in this scenario.
@georgecollier
No images are saved to the Posts table. All non-text is saved elsewhere, with a link created through a Posts_attachments table. The message can be very long, but it will only be normal characters. For context, these are slack messages that we are saving.
Add some more constraints and that might optimise your search
Again, it’s hard to do that as the Team is the only distinguishing feature in this case
Well my concern is the search that includes the items until #1 as well as first item seems to be effectively two queries of the same thing. Which seems inefficient. And clearly there is an issue here where the volume of data being returned is causing the hard limit to kick in.
Yes, this is correct. In Bubble’s backend, it’s just a string that’s the unique ID of the thing.
This isn’t directly relevant to your issue, but schedule API workflow on a list would be much more performant and reliable than the recursive workflow now. Schedule API workflow on a list of Posts (the BE workflow just takes a post) and in the workflow delete the relevant post. I can’t remember off the top of my head but I think this works to 100,000 things on unsorted lists.
It’s not
Hmmm, maybe date? Is a Team not attached to an Enterprise/Organisation too?
Yes, it could double the query time in theory but they’re logically equivalent.
Search items until #1:first item = Search:first item = Search items until #1 in terms of data returned.
Thank you for this! Just read his post, will read the rest of the thread when I have time. Understanding how to maximise WU, database load and server load as becoming top priority for me as my app is pushing against the limits more and more as it grows
My other concern is that Team is not the primary index, where as the UID is. Yes, Team will work, but will Bubble create a secondary index automatically? And with the volume in play here is that a factor in the OP problems?