Bulk Delete in Parent-Child Tables

Problem

As has been mentioned previously (see here, here, and here, for example), bulk operations in Bubble are not performant. Even the recommended approach of recursive back-end workflows on a list are a frequent source of Bubble’s grating “Workflow Error - Sorry we ran into a bug. Try back later”. This log message shows up for lists larger than a few hundred items during execution of workflows with immediate iteration. From our experience this log message is actually a symptom of back-end capacity limits timing out a workflow, and not an actual error in any workflow. Even Bubble’s own documentation cautions against the liberal use of making changes to large lists.

Parent-child relationships between database tables in Bubble offer a special case of this problem. In this case we want deletion of a single parent record to cascade to the deletion of a (very) large number of child records. We have been able to successfully “hack” a solution that satisfies the following constraints:

  1. Bulk deletes of the child records of a parent record have to look immediate on the front-end.
  2. Actual garbage collection needs to be throttled and deferred to off peak load times. Even if it takes hours behind the scenes.
  3. We cannot rely on directly modifying any fields, because of the capacity limitations of back-end bulk workflows.

Experiments

We have experimented, and failed to varying degrees, with six different techniques:

  1. The obvious, delete a list of things. When used on the client-side, with sufficiently large lists this will hang the browser. On the server-side we frequently hit the ambiguous “Workflow Error - Sorry” message. When there are more than a few hundred records we find that workflow will fail on about 20% of the list items.
  2. Next we tied flagging records for eventual garbage collection using make changes to a list of things. This faces the same issues. Either hanging the browser on the front-end. Or failing on about 20% of records when run on batches larger than a few hundred.
  3. We tried to schedule a workflow on a list, where the workflow deletes each item. Again the same 20% failure rate above a few hundred records.
  4. We tried to schedule a workflow on a list flagging records for eventual garbage collection. We still hit the 20% failure rate above a few hundred records.
  5. We next set up a recursive workflow where each iteration was run immediately, deleting the items on the list. Even with explicit recursion the deletes failed about 20% of the time when the list was more than a few hundred records.
  6. Finally we tried recursive workflows with immediate iteration to flag for eventual garbage collection. This attempt resulted in the same 20% failure rate.

Our only success was to explicitly iterate operations with a one second delay between iterations, and between all other workflow invocations for good measure.

This presented us with a problem. How do we signal to the front-end user that the child records are at least intended for deletion, when the only way to ensure deletion was to introduce a one second delay into every workflow invocation?

Solution

The key insight for us was that a reference to a parent record in a child record is set to empty as soon as the parent record is deleted. Note that this is not true for lists. Deleting an underlying “thing” referenced in an explicitly constructed list does not remove the reference from the list. Even though the reference is now invalid.

Our insight constrained the problem of immediate front-end notification of deletion down to setting up the appropriate search restrictions on child records. In our all displays of child records we require the reference to the parent record to be not empty. Our setup works as follows:

  1. We create a parent table, for example records of uploaded files File.
  2. We create a child table, for example results extracted from the files Result.
  3. We create a field in the child table that references the parent record in the parent table, for example Result's Source File.
  4. When the child record created we immediately populated the parent reference field, for example Result's Source File = A Parent File.
  5. We constrain the front-end to display only child records where the parent reference field is not empty, for example Result's Source File isn't empty.
  6. When a parent record is deleted Bubble automatically sets all the references to the record to empty, important to note this happens for singular database fields only, not explicitly constructed lists.
  7. During off peak times we slow drip the garbage collection of child records with an empty parent field.

Postscript

The deeper “theoretical” reason this works is that we are determining a priori, when the record is created, which records can be bulk deleted together. When specifying arbitrary criteria for choosing records for bulk deletion this method is much more awkward.

9 Likes

Duuude! Thanks for the transparency and excellent structure. Absolutely top-notch.

1 Like

I appreciate the work you did here. I have been struggling with this issue and will try this new method referencing parent records. I sure wish bulk data manipulation was more stable and predictable

We have also just implemented a stable garbage collector. The gist is every 128 seconds we run a recursive workflow that deletes the oldest 128 records, matching a particular criteria, such as created more than a year ago, and then schedules itself for 128 seconds in the future.

It’s not fast, but it does slowly free up storage used by historical data that is no longer needed. And as long as in the long term average you are creating less than one record per second the garbage collector will eventually catch up and only delete the most recently expired records.

2 Likes

@emmanuel any plan to make this nice and safe at some point? This would also benefit Bubble as we would clean up our databases.

1 Like