Following up on some of the questions and concerns –
Re:
… that’s a great point – that was our main hesitation with this before, but we’d likely be comfortable releasing it on the new WU plans. I can’t promise we’ll prioritize it immediately, because given the way our client-side caching works, the implementation of this feature isn’t completely trivial, and we’re trying to carve out space in our roadmaps for more reliability, observability, and bug fixing. But I will highlight that feedback to the team because it’s definitely worth reconsidering.
Here’s what we currently provide:
- We have a 24/7 on-call engineering rotation, so that there’s always someone who can be paged in the event of an emergency
- We have an automated alerting system that will wake the on-call engineer up if necessary in the event of an emergency
- We have 24/7 tier 1 customer support. While they generally aren’t able to diagnose issues themselves, one of their responsibilities is to monitor incoming bug reports for patterns indicative of an infrastructure emergency, and wake up the on-call engineer as needed.
This isn’t perfect:
- Our automated alerting has gaps, and we’re still calibrating thresholds; as @payam.azadi mentioned above, in this case, the amount of degradation did not meet the threshold that would set off the alert.
- It can be difficult to for our 24/7 support teams to discern patterns when the issue only affects a smaller fraction of our user base. While Bubble has many apps that rely on real-time updates for mission-critical functionality, for the vast majority of apps it is a nice-to-have feature, so an outage doesn’t result in as many bug reports as an outage affecting a more universally-depended on feature.
This isn’t an excuse – I want us to get to a state ASAP where we don’t miss a major feature degradation like this one. Part of the reason I’m excited to welcome Payam to the team is his previous experience at companies with world-class operations, and I expect to see significant improvements to our observability and alerting over the coming months.