We’ve been building an application on bubble off and on for just over six years and in general I love the team here and have been a supportive, happy Bubbler since 2017. Our primary application has 110,000+ users now, which sounds like a milestone most of us hope we’ll hit on the way to properly scaling, but not nearly a level that would qualify as an outside ultimate goal.
There’s only one issue: Bubble’s basic infrastructure breaks at scale when your application is popular enough that multiple users try to take the same action at the same-ish time on the same field of an object in your database.
For our business, we have a list of Students as a field in an Event object. When a client books, they’re added to the Students list and the platform knows how many spaces are left by comparing the Students count to the capacity of the class. After they do that, we run a series of workflows to confirm the booking and charge them. That worked great for years. Until we actually became relatively popular and multiple users started booking classes in tight time intervals.
Because of race conditions, Bubble can’t process multiple independent actions on the same list within short timeframes (a half second in our case, but longer based on comments I’ve read). Instead, it changes the list based on what it looked like when the workflow started running and rewrites it as that original list plus the item being added. The issue with that is that if three things are being added simultaneously (when the beginning list didn’t include any of the three), the database will be rewritten after all three and ultimately only include the last one rather than all three.
The end result of this for us is that students were booking and paying and getting confirmation emails, but being deleted out of class by each re-write and having no space left when they showed up.
We built a safety net that checked whether a Student had been re-written off the Event list and then added them back a second after the workflow that was breaking had finished, but that broke because one second wasn’t long enough. We’ve asked Bubble what timeframe would be long enough for the safety net to work, but they haven’t been able to provide an answer yet.
After a week of back and forth with customer service, it’s clear that Team Bubble knows about the race condition issue, but the time and cost it would take to fix it has pushed off implementation. Instead, they suggested restructuring our app into a series of hundreds of “do a search for:counts” rather than relying on lists to be reliably written and maintained with the recursive backend workflows we’re currently using. We may or may not do that as it seems like it would create other issues (speed being one, processing power and cost being others).
Posting this here not as a complaint about Bubble (their customer service has actually been great about this), but as a warning to other Bubblers to be very careful about how you use list fields if you’re planning to scale your application to a point where multiple users could be taking the same action at the same time. This issue has been pretty costly for us this month both financially and in terms of our reputation with our clients, so I’m definitely wishing I’d better understood the challenges and structural considerations of scaling with Bubble before I got started.