I appreciate the move to incident.io and the faster update times when there is an outage, but the old status page had some really useful charts for debugging issues, particularly around tracking when there had been latency spikes that can explain failed API calls and other weird user experiences.
@fede.bubble or others, is there any plan on bringing those charts back for the new status page?
Thanks for raising it up, I’m definitely tracking this and looking into ways to expose the metrics again. The tricky part is there’s not a quick solution to it, so it takes engineering resources.
We had 4 metrics we showed prevoiusly (see below, and their source)
“Externally measured latency” - Pingdom
“Median end-to-end page load” - custom metric on AppOptics
“Median system api latency“ - custom metric on AppOptics
“Successful system api requests” - custom metric on AppOptics
For the custom metrics, they were stored under AppOptics, which you can see is EOL this month here, it required us to switch to a different observability vendor (Observe, which is actually much better for our use-case)
So the integration with the old statuspage also would have broken the custom metrics we had as well, even if we didn’t switch to the new statuspage.
We are investigating to see if there’s an easy way to expose the Pingdom metric while we think about a more custom solution for the other metrics.
FYI, right now is one of those times where it would be helpful to see the external latency. I’m getting elevated errors on inbound APIs due to application timeouts, I suspect due to latency spikes but its hard to diagnose without those charts
@tj-bubble any updates here? I’m noticing some consistent trends where latency can have huge spikes for a short window during the mornings US that causes all sorts of error alerts. Most of the time, its fine, because Bubble hasn’t actually dropped the requests (just delayed processing and responses), but it would be helpful to see the stats / trends to confirm there aren’t other issues outside of bubble that need to be addresses
For example, these are screenshots for two runs of the same external workflow hitting bubble in my app. This afternoon, function times are reasonable (they’re database heavy hence why its still a bit long), but in the morning, we kept exceeding 10s timeouts and raising alarm bells
No great option yet to expose it, but thanks for additional clarity, tracking those details in my requirement.
In your case, are you not able to determine from existing observability (logs tab → WU charts + logs) on your app that you’re not getting throttled by a lot of activity?
Thanks for mentioning this, I wasn’t aware that bubble had any throttling so this is a new area for me to plan around. I do have some workflows that result in very spiky consumption (large data deletions and writes - I am working through migrating this component off bubble since they data itself is a “snapshot” that doesn’t get realtime updates so we don’t need the full infrastructure)
I don’t see any specific indicators of being throttled on the App Metrics page, and I don’t know where I would find that. Our peak concurrent workflow runs is around 5k, but we’re on the growth plan so I think that’s well below the 25k API hard limit that I saw here: Hard limits | Bubble Docs
Is there another limit I should be aware of / planning around?
Historically, these spikes of latency happen in the mornings, and I used to see in the page load and latency charts that bubble generally would have slower performance in the mornings (presumably from more usage across the network / apps). That’s why I assumed this particular issue was correlated to system latency, not any throttling for my app specifically