Hi all,
This is our March community update! Read last month’s update here.
This was a pretty mixed month. We ended up spending a lot of our time firefighting various production issues (more on that below), but at the same time, some of our efforts in terms of scaling up our support starting making a visible difference in our response time numbers, and we added a bunch of great new teammates.
We welcomed:
- Sam and Patrick on our Success team
- Alex on our Marketing team
- Kaela on our People team
- Shyheme on our Engineering team
- Yiwen on our Finance team
- Fabian to lead our Sales team
As always, our open roles can be found here. As mentioned in previous updates, these two roles are ones we think are particularly interesting for current Bubblers:
-
Technical Product Support Specialist: help us do deep investigations of user-reported bugs and issues. Does not require programming experience, but previous Bubble experience is a big plus!
-
Video Producer: help us create amazing educational content to accelerate the learning curve of new Bubblers!
Changes we made this month
We continued to improve our new responsive engine. In addition to numerous bug fixes and small improvements, we added drag-n-drop in the elements tree, row and column gap controls, and padding on containers. We also released a video on using layout and sizing properties in conditionals.
Our work on version control reliability continued with various bug fixes and improvements.
In terms of community-building, it was a busy month:
- We worked with some of you to help spread Bubble and no-code on social media via the #BuildInPublic and #BuildWithBubble hashtags. We want to thank everyone who applied, and while we couldn’t onboard everyone for this first trial run, we hope to do more together soon!
- We’re getting Bubble bootcamps in more schools (for instance, a private bootcamp with Notre Dame’s IDEA center). In general, we’ve been investing in bootcamps, because we think it’s a great way to help more people become Bubblers, and because we want to launch the careers of more freelancers. We know there’s a shortage of Bubble freelancers right now and want to help as many people as possible get started!
- We’ve begun preparing for the next Immerse cohort: see updates to our Immerse homepage for more details. We also posted a follow-up post checking in on some of our previous Immerse participants.
- We sponsored TreeHacks, and are looking into ways to scale sponsoring more hackathons. If you’re an interested hackathon organizer, please reach out to us!
In addition to the above, a few posts on our blog you may be interested:
- A profile of an exciting web3 company built on Bubble showing how web3 pairs with no-code!
- Some tipes for tracking analytics with Segment + June
- Three new App of the Day posts, and two profiles of our bootcamp instructors Dave and Nazz
This month in numbers
-
New conversations via bug reports or support@bubble.io: 8,189 (down 7.6%).
-
Average first response time to messages: 3h 47m during business hours (down 43%)
-
Average response time to messages: 3h 44m during business hours (down 42%)
-
Open tickets being investigated by the engineering team: 55 (down from 53)
-
Of those, tickets that have been open longer than 7 days: 22 (down from 35)
Things on our minds
Like most of the world, we’re watching the events in Ukraine. We have a number of Ukrainians as part of the Bubble community, and our thoughts and prayers are with you as we hope for everyone’s safety.
As a reminder, experts expect the current Russian invasion of Ukraine to lead to an increase in cyberattacks. See, for example, CISA’s warning here. The most common and avoidable form of cyberattack is “phishing,” where someone attempts to trick the user into 1) providing secure information (e.g., passwords, bank account information, etc.), or 2) following a dangerous link in an email or text that looks like it comes from someone they trust.
- Please be extremely skeptical of any communication that seems at all suspicious.
- We highly encourage you to set up 2-factor authentication on your main Bubble account if you haven’t already done so.
- If you spot any phishing attempts on Bubble apps, please report these to legal@bubble.io and support@bubble.io.
We spent a lot of time this month dealing with various outages and production emergencies. Some of these were self-inflicted, some of them were the results of scaling pressures that we fell behind on keeping up with, and some of it was simple bad luck and poor timing.
The longest and most severe outage fell in the third category: Bubble’s domain was blocked by CloudFlare as a phishing website, which led to a number of our systems breaking. After discussing the incident in depth with CloudFlare, our understanding is that it was simply bad luck that this occurred, that the timing alongside all the other production incidents was coincidental, and that they’re taking appropriate steps on their end to prevent a similar incident from recurring. That said, the extent of the downtime this caused was due in part to some design flaws in our caching infrastructure that led to the CloudFlare warning page to be cached even after CloudFlare reversed the mistake on their end. We are in the process of improving this aspect of our caching, and expect it to lead to an overall simpler, more resilient infrastructure.
The other major issue we’ve been dealing with has been a series of recurring database outages. After investigating, it looks like this is a result of an upgrade we made to the version of Postgres we run (more specifically, to a database extension). The upgraded version introduces a bug that breaks the tool we use to automatically defend our databases against dangerously-long-running queries. Unfortunately, it would be a tremendous amount of work to downgrade the our databases to a version that does not have this bug. We’re taking the following actions:
- We’ve disabled all the specific queries that led to the outages that occurred. Since we took this step, we haven’t seen another recurrence of the same category of problem.
- We’re changing the way we run many of our most long-running queries to avoid this bug in the first place, so that our automatic defense mechanism works going forward.
- We’re planning to build a secondary backup defense mechanism that works along different lines (preventing dangerous queries from starting in the first place, instead of terminating them after the fact)
- We’re working with the maintainers of the Postgres extension to get a fix for this bug in place.
A third category of issue we’ve had has been DDoS attacks: we saw two severe attacks against our infrastructure this month. Both attacks had the same technical signature. We are not sure what motivated the attacks or where they originated. For both attacks, CloudFlare was able to mitigate them automatically within 30 seconds, but during the 30 seconds, they sufficiently overloaded our servers that it caused user-facing downtime. We’ve made some changes to block attacks more efficiently on our end, and believe that this would be sufficient to stop a DDoS attack that follows the same pattern. We’re also working on some longer term changes to load pages more efficiently that should improve page load performance and make us more resilient to other kinds of DDoS attacks in the future.
A fourth(!) issue we’ve had is with our real-time updates system. Users have been reporting to us that intermittently, searches would stop updating with fresh data. After investigating, we tracked the problem down to performance issues with the database that powers these updates, and found a key indicator metric that reliably predicted user reports of the problem. We found a band-aid solution to relieve pressure on the database, which has stopped the issue from occurring. We’ve started work on a longer-term fix, which involves simplifying our real-time update system in such a way that this class of problems (as well as some other problems we’ve dealt with in the past) won’t be able to occur.
Finally, we had some self-inflicted mistakes where we rolled out code that we later had to revert because it was breaking user apps. The worst incident involved an application change in behavior that, on reflection, we should have released as a new Bubble version rather than as an immediate bug fix. We had a long conversation as a team about that incident, and came out agreeing on some operating principles for avoiding that kind of error in the future, including taking advantage of tooling to better predict whether apps are relying on a “bug”, and a process for declaring a new Bubble version that encourages the team to rely on it more frequently.
The good news on the self-inflicted bug front is that our investment in automated tests have been catching more issues before they affect users, and we’ve seen the rate of releases that we have to roll back go down. We also to continue to see the Scheduled tier working as a tool for sheltering apps in production use from these kinds of bugs (although the Scheduled tier does not provide protection from the infrastructure-level issues described above). All that said, we still break user apps with new code far too frequently, and changing this is something we’re working towards.
Anyway, that’s a pretty long list of issues, and it’s been an exhausting month for the team as we’ve dealt with all of them. As all of those different categories are unrelated, it’s not obvious to us why this month in particular has been so bad. Zooming out, though, we think it’s an overall consequence of our userbase growing, Bubble becoming more prominent, and all our systems and processes having more pressure on them. We’re doing some internal reprioritizing to make sure we’re spending enough time on preventative maintenance to keep up with these various pressures as we grow, because we know how much Bubble downtime can hurt businesses and organizations that rely on us.
What we’re currently working on
Due to the outages described above, we didn’t make as much progress this month as we were hoping, so most of this section is copy-pasted from last month’s update. The main thread we were able to push forward was the new responsive engine. Now that we’ve shipped the improvements described above, we’re focusing our attention on the plugin editor, so that plugin developers will be able to start upgrading their plugins to work with the new responsive paradigm.
Also, as mentioned above, we’re allocating more time for preventative maintenance; we’re exploring a number of different investments we could make to improve reliability. Most of the contemplated changes are behind the scenes and won’t be directly visible to users.
Repeating the relevant updates from last month:
Our major performance push, focused on data loading and rendering, continues. We’re continuing to work on optimizing invisible elements, building a data dependency graph to optimize querying, and generating HTML and CSS upfront instead of on the fly. All of these are major projects and we expect them to take months to quarters of work, though for some of them we expect to see performance improvements along the way. We’re also pursuing smaller performance bug fixes as we find them.
Another note on performance: the new responsive engine performs significantly better than the old one, because the old engine relies on a lot of Javascript-based manual manipulation of element positioning, whereas the new engine relies on CSS, which is much faster. So we see continuing to improve the new engine and making it easier for both new and existing projects to move onto it as a major performance priority as well. Some of the upcoming performance work we’re doing will only benefit the new engine, because it builds on top of the technical changes the new engine introduces behind the scenes.
We’re also doing work around the new user experience and general in-editor education, to make Bubble easier to ramp up on for new users. This involves fixing some ongoing painpoints such as how long it takes to load the plugin installation UI, as well as tweaks to the onboarding and new app experience.
Another workstream we’re kicking off is an overhaul of our network architecture with the goal of hardening our security posture. We aim to be SOC2 compliant by the end of the year, and this is one of the key prerequisites.
As mentioned above, we’re continuing to improve version control, both from a bug-fixing and feature development perspective.
On the QA front, our outsourced partners have now built 544 tests. For migrating code to Typescript, we’ve now actually started the Typescript-ification part of the process – we have one file in our main codebase that’s in Typescript – and we’re making slow but steady progress off of coffee-script, which is down to 44.6% of the codebase.
Thank you for your patience with us this month, and earnestly hoping March brings better news both for the Bubble community and for the world. Best,
Josh and Emmanuel