[Feature Enhancement] Improved Scalability of Schedule API Workflow on a List

Update: Please see my most recent post in the comments, which addresses frequently asked questions.

Hi everyone,

We’re excited to share that we’ve released an enhancement to Schedule API Workflow on a list. Previously, this action was only capable of scheduling a few thousand workflows before timing out. Now, users can reliably schedule and execute workflows on lists of tens of thousands of things!

This ~10x improvement in scalability makes Schedule API Workflow on a list the best way to accomplish bulk data work in Bubble‌ — ‌whether you’re looking to implement complex business logic or make simple updates on large data sets.

Until today, you may have been using recursive workflows to iterate over lists as a workaround (i.e., building a workflow that runs on one item and then schedules itself on the next item until the list is complete). While that can work well when implemented correctly, Schedule API Workflow on a list is faster, easier to set up, and more efficient. Schedule API Workflow on a list also eliminates some of the inherent risks with recursion, like getting caught in an infinite loop or having the recursive chain broken by a single workflow encountering an error.

Example: Running an email campaign

Let’s say you have 20K users, and you want to send them all a personalized email. You may want to log this outreach by updating a field on each User to indicate the last time they were contacted. You can create an API workflow that sends a personalized email and updates this field for a given User (in the screenshot below, we’ve named the API workflow “monthly-email-send,”) and then use Schedule API Workflow on a list to run this workflow on your entire user base.

This is the first of many improvements on our product roadmap focused on making it easier to manage databases in Bubble as they scale in size and complexity. Over the next six to 12 months, we’ll continue to scale the power of bulk workflows and database actions by orders of magnitude, as well as provide better tooling for observability, management, and controls. Especially if you’re already using recursive workflows in your app, watch for these updates as they may provide better solutions for working with large datasets in Bubble. We’re excited to hear any feedback you have. Thanks so much!

59 Likes

Allright! :+1:

1 Like

giphy

3 Likes

This is great, but there are a couple of questions:

  1. Was this feature enhancement actually pushed a while ago and only announced now? People have been noting that it seemed to be possible before this announcement was made - just trying to work out if this covers that change or is an entirely new feature just deployed today.

  2. Does this always work on heavy data types? This was also discussed (heavily :wink: ) on a recent forum thread with silent failures on heavy data types being a particular issue.

  3. Can we get an explicit recommended limit? Does it depend on data size at all? If you were building an app, how many records would you be comfortable scheduling this on? 50K? 100K?

Again great update, just looking to squeeze some more details out :slight_smile:

9 Likes

Nice one. Will it tell me that it encountered an error? Or how many workflows it actually ran? Say, I sent 20k items, but it only ran 16k?

4 Likes

Ooooo, snarky. Like we haven’t had a long time to work around these potential issues :slight_smile:

Like when you didn’t allow recurisve workflows, so we (someone smarter than me) invented ping-pong workflows.

However, this is a step forward.

+1 to this. Up until a day ago we were still seeing the issue where it was the LIST that couldn’t be created, not that you couldn’t schedule on that List.

Wake me up when you can delete 100,000 rows in less than a day :slight_smile:

17 Likes

+1 to this - we need more details. What’s the current limitation ? What’s the max we can safely push for this wf.

8 Likes

Same questions here.

  1. Let’s say we have hundreds of thousands of rows that needs to be updated, will this work on all rows without any problem?

  2. If this update can only accommodate around 20,000 rows, will simultaneously update batches be possible? Like run api workflow on a list of bucket1 (1-20,000 rows), bucket 2 (20,001- 40,000 rows), bucket 3 (40,001 - 60,000 rows) and so on? Will it cause any timeout or error?

  3. How long will it take to update 100,000 rows using this update?

1 Like

And we would test this but it costs $9 of WU to do it once!

3 Likes

@steven.harrington

  1. What would be WUs implication here? I was recently running a recursive workflow which had probably 2-3K records to begin with and reducing with each invocation. Those invocations caused 100+ WUs for each invocation itself (Just the invocation of recursive workflow, not the actions within workflow. And invocation didn’t have anything just calling the API and list=list-1 kind of thing. No DB call, nothing). I ended up spending about 400K WUs in a few hours of updating one field on 2-3K records.
    So I am really keen to know what would be the WU calculation here.

  2. My workflow that I mentioned in my previous point was also something similar to the example you gave. I had run some whatsapp campaigns and had to note down for users as to how many notes have been sent to them in past. If such a small update can cost 400K WUs for 2-3K users, I wonder how are we supposed to build production applications in 175K WUs for a month all things put together. I was also told that WU consumption was this high for my workflow because I had long list in my recursive workflow (while that has been the recommended way of list operations). So it is important to understand WU implications and if they are connected with length of the lists, size of the elements etc as @georgecollier has also mentioned.

  3. The example that you gave is not quite clear. You said that we may want to do two things, “sending the email to the user and updating the field when they were contacted”. Did you mean that we need to do two separate “run on list” workflows for this? Or you meant to say that the workflow “monthly-email-cpn” that you have does both the things, updating the user field and sending email?

Really? I thought we were recommended to use this for only 100 items in a list earlier.

Like @georgecollier has commented, what is the meaning of 10x? As per earlier note that I have we could run this function only for 100 elements. So now can we run it for 1k elements?

Great questions. +1 to these.

+1 And also when deleting 100K rows doesn’t empty yours and neighbour’s bank accounts.

3 Likes

When I last looked closely here, the issue was that all 20k users were being fetched in the frontend before being scheduled.

That would result in a crazy fetch and time out.

Is that not the case anymore.

If I’m running schedule API workflow on a list, I can be safe in the fact that that the search isn’t being pulled unnecessarily in the front-end?

Thanks
Zubair

2 Likes

This is a great improvement over the old limitations…

But the trouble with this is still that it’s not reliable at higher volume, with no way to know why or where it stopped, and it’s hard to know where the upper limit is.

As another test, I just scheduled a workflow to run on a list of 100k items and modify a single field, and this time (to my surprise), the list was scheduled without an issue (it took around 5 minutes to begin running)…

But the workflows stopped running after around 48k items, with no indication of why in the logs (no errors, no timeouts… nothing). It only took around 15 mins to modify those 48k things, and cost about 50k WUs.

So, at the moment, I’d say this is great for making changes to a few thousand things (and much faster, and cheaper in WUs than a recursive workflow), but for more than 30k or so it’s a bit too hit-and-miss for serious use at this stage…

At least with recursive recursive workflows, if they do break, you know where… so you can kick things off again from the same place.

It will be interesting to watch the improvements here over the next 6-12 months.

(also, it would be interesting to know what causes this to simply stop running… rather than jus slowing down…)

21 Likes

@adamhholmes We salute you, our brother, for this 50k WU test :vulcan_salute:

Hard to believe that just over a year ago, we could test something like this as part of our normal plans (albeit with the caveat that capacity could be maxxed out)… Now it just cost you an entire Free month plan’s worth of WU in one click… and 50% of your development plan budget… in one click. One user… just manipulating data…

@NigelG When looking at these “bulk workflow” updates from Bubble - I can see why there’d be a priority after years of this being so painful for us (well, it’s still painful… but ya know)… They’re finally about to make one hell of a return on the development time. :upside_down_face:

I really don’t like bringing all this up again, but was the first thing I thought when I saw that 50k WU cost.

9 Likes

Top notch Adam material.

6 Likes

Thanks for testing this for us and sharing your findings @adamhholmes

2 Likes

@steven.harrington another question. Will same “schedule workflow on a list” be the thing that would be invoked if we do “Bulk” operation from App Data in the editor? Will that have the same scalability, WUs and other things as the workflow action?

Recursive workflows have a huge benefit - you have a queue.
Scheduling on a list creates chaos. Scheduled workflows may overlap each other. That’s a pain.
Queue helps to control traffic. You can have a dynamic delay if needed between each iteration.

You’ll be pissed off if you try to create, for instance, import logic using Scheduled on a List. Usually, import logic requires to creation and modification of things in the DB. Without a queue, you’ll have duplicates in your system.
When you create or update a thing, that doesn’t happen immediately since it’s an async action.

5 Likes

this is a quick message to @josh .

why did people pay me top dollar as a developer?
Because I did not only add the new features but thought of the 5 edge case scenarios and 5 FAQ questions people would have ahead of time.

it was completely predictable based on the bubble forum, even from 2020 threads, that people would like to know about WU cost (2023), would like to know if this works for 100k rows and would like to know if the workflow fails but these questions were not addressed when launching this new update.

thats the kind of work that is needed to get bubble to the enterprise, and to turn me back into the hardcore bubble fan i was in 2020, when the company was still at a maturity and funding level I accepted these shortcomings.

WU cost of 50k for 100k rows is ridiculous seeing how every SMB can do this for free on other backends every day of the month

19 Likes

Moreover, if you need to cancel all scheduled workflows triggered via ‘Scheduled on a List,’ it’s not a quick process. You’ll need to set up a logic to store the scheduled IDs first. It’s much easier and more cost-effective to cancel a recursive workflow.

Another benefit of recursive workflows is the ability to create a Live Progress Screen, showing information such as ‘10 of 2000,’ for instance.

8 Likes

This is a great point. Need to know if the improved version of “run workflow on a list” has a way to serialise the operations. Running operations in sequence is very often quite critical. And sometimes even if we don’t want it in sequence, we would want to avoid some collisions e.g. If I am running a workflow on list of transactions to update users’ payment records, I would not want two parallel workflows updating same user’s data.

Another important point. Need to understand how do we cancel this list operation and to track progress.

2 Likes