What is Bubble's policy on pushing changes?

Hey all –

First, @samnichols , really sorry about what happened with your app. I just checked in with the team and they’re pushing a fix right now (they tried to get it out yesterday but ran into some technical issues). Should be live shortly. Anyway, if you continue to have problems, please keep the support thread you have open with us updated, and we’ll get to the bottom of them.

Addressing the more general question:

Our policy is to never knowingly release a change that will break user apps without creating a new opt-in Bubble version (or experimental feature that will later become a version). There’s been a couple of occasions in the past year where we’ve violated this, generally because of miscommunications or newer people on the team, but each time we discussed the incident internally and reiterated our commitment not to do this.

We do currently release changes that we do not think will break user apps, including bug fixes and performance improvements. This is not an ideal policy, because it leads to incidents like this one: Bubble is a very complicated product, and sometimes we miss ways in which something we think is a harmless change will cause problems. In this case, 99% of repeating groups were unaffected, but for people like @samnichols in the unlucky 1%, a bug like this can be incredibly disruptive.

The reason we have this un-ideal policy is because of technical limitations on our end. Our “versions” tab in the editor doesn’t represent truly different versions of our code. Instead, it’s basically an “if statement” behind the scenes: “if the user app is on version 5, do this; if they are on version 6, do that”. Because of that, we think it’s unsustainable to release all code changes as new Bubble versions: our codebase would quickly become riddled with different branches and totally unmaintainable, and we think that the overall outcome for users in terms of the reliability of Bubble’s platform would be worse rather than better.

Currently, our tools for having truly different versions of the Bubble codebase are the Scheduled cluster and Dedicated clusters. All apps on the Immediate cluster are on the same underlying commit, all apps on the Scheduled cluster are on the same commit, and then each Dedicated cluster is on its own commit.

This bug did in fact make it to the Scheduled cluster, because the number of users it impacted was small enough that we didn’t realize there was a regression until the code had already been shipped to Scheduled. Generally, when we push a regression like this, we get a spike in bug reports that we can flag as a drop-everything-and-fix event; this time around, unfortunately, the bug reports trickled in more gradually and the team didn’t realize what had happened til the code had already moved to Scheduled.

That’s the current state of affairs. We know this is not ideal, but we want to be transparent with you all about what our capabilities are at the moment. We’re working on making our platform more reliable from several different angles:

  • On the infrastructure side, we are doing some deep investments that eventually should give us the capability to potentially have every single Bubble app on its own version of the codebase, so that users have full control about when they upgrade. I can’t give an ETA on this, because this is a major workstream and there’s still a bunch of technical details we’re sorting through, but I could see it potentially happening in 2023.

  • We have a culture around building new automated tests each time a regression ships, to gradually close off gaps in our test coverage so that events like this can’t happen again. While there are always going to be corner cases, our goal is to make it so that instead of bugs that affect 10% or 1% of our apps, the only bugs getting past tests are ones that affect 0.1% or 0.001% of our apps, making incidents like this affecting you personally becoming very rare.

  • We are also in the process of converting our codebase to Typescript, which unlike Javascript is a strongly-typed language. This is a technical measure that will knock out huge categories of potential bugs. It won’t necessarily fix everything (and I think this particular incident likely would not have been prevented by Typescript), but again, it pushes us towards a world where the frequency at which any individual app gets broken gets rarer and rarer.

  • Our Success team is continuing to improve their processes around identifying regressions and escalating them to engineering, with the goal of continually increasing our ability to sort out “we just pushed code that broke something” from “user stumbled over a long-standing bug”, so that we can triage effectively. Our overall response times have been trending steadily downward, though in this case we didn’t identify this as a regression requiring immediate response as fast as we would have liked to

  • We recently added the capability to roll things out as experimental features, and plan to rely on that more heavily going forward as a way to test things out in a lower-stakes way

  • I do want to remind people that for scaling business, you can contact sales to discuss moving to Dedicated. I realize that the price point for this is a deal-breaker for a lot of you, so isn’t a short term fix, but I’m putting it out there because it means that as your business gets increasingly successful on Bubble, there’s a path forward to stay on the platform even as your reliability needs increase.

I hope this provides a bit more transparency into how we think about incidents like this. Again, I know how frustrating this is, especially because it feels out of your control. We can’t guarantee 100% uptime – no tech platform reaches that level of reliability – but we’re aiming to keep pushing our uptime levels higher and higher for all of our users.

13 Likes