Race Conditions: Beware of Using List Fields in Objects If You Plan To Scale

We’ve been building an application on bubble off and on for just over six years and in general I love the team here and have been a supportive, happy Bubbler since 2017. Our primary application has 110,000+ users now, which sounds like a milestone most of us hope we’ll hit on the way to properly scaling, but not nearly a level that would qualify as an outside ultimate goal.

There’s only one issue: Bubble’s basic infrastructure breaks at scale when your application is popular enough that multiple users try to take the same action at the same-ish time on the same field of an object in your database.

For our business, we have a list of Students as a field in an Event object. When a client books, they’re added to the Students list and the platform knows how many spaces are left by comparing the Students count to the capacity of the class. After they do that, we run a series of workflows to confirm the booking and charge them. That worked great for years. Until we actually became relatively popular and multiple users started booking classes in tight time intervals.

Because of race conditions, Bubble can’t process multiple independent actions on the same list within short timeframes (a half second in our case, but longer based on comments I’ve read). Instead, it changes the list based on what it looked like when the workflow started running and rewrites it as that original list plus the item being added. The issue with that is that if three things are being added simultaneously (when the beginning list didn’t include any of the three), the database will be rewritten after all three and ultimately only include the last one rather than all three.

The end result of this for us is that students were booking and paying and getting confirmation emails, but being deleted out of class by each re-write and having no space left when they showed up.

We built a safety net that checked whether a Student had been re-written off the Event list and then added them back a second after the workflow that was breaking had finished, but that broke because one second wasn’t long enough. We’ve asked Bubble what timeframe would be long enough for the safety net to work, but they haven’t been able to provide an answer yet.

After a week of back and forth with customer service, it’s clear that Team Bubble knows about the race condition issue, but the time and cost it would take to fix it has pushed off implementation. Instead, they suggested restructuring our app into a series of hundreds of “do a search for:counts” rather than relying on lists to be reliably written and maintained with the recursive backend workflows we’re currently using. We may or may not do that as it seems like it would create other issues (speed being one, processing power and cost being others).

Posting this here not as a complaint about Bubble (their customer service has actually been great about this), but as a warning to other Bubblers to be very careful about how you use list fields if you’re planning to scale your application to a point where multiple users could be taking the same action at the same time. This issue has been pretty costly for us this month both financially and in terms of our reputation with our clients, so I’m definitely wishing I’d better understood the challenges and structural considerations of scaling with Bubble before I got started.

16 Likes

Yes lists are quite annoying with the race condition issue.

Another issue is you can get “ghost things”, where when you delete something and a list field had that thing, if you don’t go out of your way to remove it from the list as well it can actually still be counted or considered still there at random times… 🤦🏼

4 Likes

Thanks for the detail and bringing this up. I have had constant issues regarding this in my own registration type app. Each year there is a specific event that I faces race condition issues. I’ve tried lists, searches and almost everything I can think of, nothing has worked perfectly - so while I would say Searches are better than Lists for this, they still have drawbacks.

My most recent solution for this specific event worked, but would be costly with the WU pricing (searches and then double checking searches for timestamps) and slightly ‘clunky’ of a UX that it would not make sense to use as a standard registration process. So I can toggle certain events to have this registration flow turned on.

I don’t know if there is a simple solution to this unless Bubble works on something for us as it’s just not structured for a high demand app like this, which I’ve mentioned here before

More on the subject here:

3 Likes

You know, xano just introduced support for these types of transactions requiring record locks :wink:

https://x.com/nocodebackend/status/1707410611259207741?s=46&t=MngJqMkZrqH11UIRuqhCPA

3 Likes

Hey @brian, thanks for raising awareness of this issue. And thanks to the other posters for the helpful comments and links.

So are they saying that would actually solve the problem? I ask because my understanding (after reading several of the linked threads) is that potential race conditions for near-simultaneous DB writes are just inherent in the Bubble platform and not limited to “list” field types. Or was Bubble offering that suggestion as a way to reduce (but not eliminate) the number and frequency of such occurrences?

Also, I’d be curious to know how you (or anyone else who cares to comment) might structure things differently if you were redesigning the data schema from scratch. For instance, instead of a list of Students on the Event type, might you create a separate type - say, Registrants - that links a Student to an Event and then Do a search for:count to get the number of sign-ups per event before proceeding to checkout?

I’m just anticipating a future need of my own and want to make sure I understand. Thanks for any info or insights.

Yes your Registrants suggestion is the correct structure.

Things like “likes” on a post should also have the same structure, no list fields anywhere

If you are a user part of a Discord server, same thing, you have a Profile with User and Server fields

That’s why there’s a lag when you click on a user on Discord to see their mutual servers, it’s doing a search for Profiles with that User that intersects with your Profiles, then shows :each item’s Server

2 Likes

maybe the general idea is if the thing being changed is accessible by multiple users, certain actions of that workflow need to be “do a search for”, especially if its in relation to how many things are in reference to another thing.

and if the action is a single user operation of their own data, its ok and faster to use lists.

1 Like

I have a couple of indices with thousands of text items in a field with multiple users updating the field at any one time. Our use cases may defer but how I avoid issues is by loading this list client side. Whenever something needs to update 2 things happen:

One WF updates the list client side. Then another WF schedules a backend WF to update the index with a one second delay (just in case).

My users only need to see that something changed so the UI/UX needs to reflect it but nothing happens to the actual list. The backend WF is where I make sure that an Index’s list field gets updated safely.

1 Like

Yeah, but wouldn’t the rationale be entirely different for that use case? Seems to me the issue there is about not adversely impacting performance by keeping the Post data type lightweight as opposed to any concerns over race conditions.

Yes, this is how I structure things in my apps. I never use list fields (almost never). I normally create the data type that will be a 1:1 relationship and do the search with constraint on either thing (student or event) dependent on which data type is readily accessible (ie: is the student looking at their registrations or is the need for the event to see the number of registrations.)

3 Likes

As do I, but for reasons related to performance and flexibility, not anything related to race conditions.

The motivation behind my inquiry was that the response the OP received from Bubble support seemed to suggest that such a data design pattern might also be best for minimizing race conditions. It seems clear that race conditions can’t be completely avoided in some situations, but it’s good to know the best-practice approach for high-demand scenarios.

That said (and without intending to diverge too far from the original topic), there’s nothing wrong with having a list of things on another data type, as long as one understands the implications and limitations.

1 Like

Interesting.

Yeah, Emmanuel seems to have said as much in that 5-year-old post.

That said, something did just occur to me, which seems plausible at first blush (but maybe not “fair”)…

If the core issue is timing-related, perhaps one way to address it is by injecting some randomness at the appropriate times. So for example, once the registration reaches some threshold - let’s say 85% of capacity - during a high-demand period, then start randomizing the workflow executions within a span of several seconds in an attempt to “spread out” the DB writes associated with [nearly] concurrent incoming requests.

Yes, it means the last few “slots” might not be filled in exactly the order they were requested (and someone who actually pressed their enter key before someone else might miss out), but maybe that’s acceptable in some situations.

And of course, the user might be waiting for a few seconds, but just display a busy message:

       Demand is high. Checking availability…

…along with maybe a slot machine animation. :grinning_face_with_smiling_eyes:

I’m sure it still wouldn’t guarantee no race conditions, but I wonder if it would be an improvement. :thinking:

2 Likes

I ran into this a few months ago with a list of files field where files would end up orphaned when the data was updated in an API workflow. After a back and forth with support one of their team acknowledged the possibility of this. In technical terms, they said that their database doesn’t initiate a lock on items before making changes, which can lead to this behavior. This was very frustrating to learn after-the-fact, now that I had to identify and fix possibly thousands of orphans.

I know the point of your post is not to complain about bubble, but really this seems like a recurring issue, where long-time users who rely on the platform discover that bubble doesn’t work the way they expected to, and their app suffers. I think its time that bubble publishes a general guide to how their platform works for technically inclined people, which includes thing such as their database design, specifics on workflow run order, etc…

2 Likes

@brian could rescue the existing list structure by adding the registry table as an onboarding/offboarding mechanism:

onboard:
A. create registry item: action “add”, event: event E, student: student S
B. schedule backend workflow, parameter event E

  1. update event: students add list: search for registry (action “add”, event E)'s student list
  2. delete registry item: action “add”, event: event E, students: event’s student list

In a race condition, step 1 would add multiple students at once, which could be already added, which is okay.

1 Like

I’ve faced this often and ended with demolishing thing-to-be-deleted from all satellite/connected lists at first.

2 Likes

This topic was automatically closed after 14 days. New replies are no longer allowed.