Hi all,
I want to share an update and provide transparency around the recent outages and instability we’ve seen, including incidents earlier today.
The second half of March has been rough and we apologize for the impact that has caused. We’ve had multiple disruptions, and they stem from two main issues:
1. Dependency on a third-party CDN (unpkg)
Many Bubble plugins depend on code hosted on unpkg, a third-party CDN that experienced a major outage on March 19 and again this morning. Because these dependencies can block page load, an unpkg outage can cause downtime for apps with those plugins installed.
While this issue is outside our direct control, we’re looking into ways to mitigate it. That includes giving plugin authors better tools to avoid blocking dependencies on third-party CDNs and encouraging widely-used plugins to migrate to using those tools.
2. Database shard crashes
Two of our eight main database shards have experienced repeated performance degradation and crashes over the last two weeks, leading to downtime and poor app performance for affected users. These issues are caused by long-running queries that exhaust database resources and trigger cascading failures.
We’re addressing this on several fronts:
-
Long-term: We’re implementing automatic cancellation of long-running queries. This depends on completing a major internal project: migrating off a legacy stored procedure language. We’ve been working on this since early last year and are nearly done — 47 of 54 procedures have been migrated, including the most complex. We’re targeting end of April for full completion.
-
Short-term: Given the urgency over the last two weeks, in parallel, we’re fast-tracking partial query cancellation based on the procedures we’ve already migrated. It’s trickier to implement this prior to completing the full project, but we’re optimistic we’ll have a working solution in the next few days.
-
Ongoing mitigation: We’re addressing problematic queries individually as they arise. This hands-on approach has helped us maintain some stability, and we believe a change we made this morning will improve things further.
-
Manual intervention: Our team is monitoring the situation around the clock and manually restarting affected databases to restore functionality when needed.
We know you rely on us to keep your apps — and your businesses — running smoothly. Our Platform engineering team, the largest team at the company, is fully focused on scalability, reliability, and performance. Every improvement they deliver benefits all our users automatically — no effort needed on your end.
Finally, as a reminder, for mission-critical use-cases, I encourage talking to our Sales team about our Enterprise offerings, especially dedicated hosting, by reaching out to sales@bubble.io or submitting a request here.
Thanks for your patience as we work through this.
—Josh