Limitations running workflows on a list

Hi @emmanuel and @josh,

I’m trying to understand the limitations, if any, of scheduling workflows on a list.

I read several numbers before but I’m not sure what is current and what needs updating, especially in light of the new pricing plans like the “Professional” and higher where we have dedicated capacity, so I’m asking you a few questions that may help me and hopefully other forum members better understand the capabilities of the product.

The questions assume that I have a list of N items that will be fed to a workflow with S steps. That workflow will be run on each element in that list. Assume for simplicity that with the 2 units of power that come as default in the Professional plan, running all the S steps in the workflow on just one element of the list takes T milliseconds.

  1. Is there a maximum number of items that can be in the list? In other words, what is the maximum value of N? What happens if this number is exceeded?

  2. Is there a maximum number of steps that can be in the workflow? In other words, what is the maximum value of S? What happens if this number is exceeded?

  3. Is there a maximum duration for one workflow run, i.e. the time taken to execute the S steps in the workflow? In other words, what is the maximum duration T? What happens if this number is exceeded?

  4. Is there a maximum duration for the entire list to be processed? In other words, and ignoring for now the effect of the “Interval” parameter when scheduling a workflow, is there a maximum for N x T? What happens if that number is exceeded?

  5. In light of the new pricing model for plans with reserved capacity, what is the need for the Interval parameter? How does Bubble use it and what happens if it is left blank?

  6. Is there any difference in any of the above if the worfklows are nested. A nested workflow is created when workflow A, the parent running on a List of A items, has a step that invokes workflow B, the child workflow, on a List of B items.

Many thanks in advance, and as usual, thank you very much for such a great product!

Alex

3 Likes

Hey Alex,

Thanks for the clear, well-structured question. Couple things to note upfront:

  • The main limit with big list operations right now is a time limit on workflow run execution – ie, from the evaluation of the event to the last action of the workflow. That time limit is 5 minutes for workflows run via the scheduler, and slightly longer (it’s somewhat variable, but think of it as 5 minutes and you’ll be safe) for workflows run in response to a user action or API call. If a workflow crosses this limit, it’ll end with a “too busy” error that will stop further processing.

  • We have limits on the max length of a list that can be stored as a list-type field in the database; currently it is 10,000 on the main cluster. Attempting to store a list longer than that will result in an error.

  • We have a limit on the number of results that can be returned from a search; currently it is 50,000 on the main cluster. Exceeding this won’t result in an error: you’ll just get 50,000 results instead of 50,001 results. Searches that don’t actually return results, like counting the number of items, will process more than 50,000 (so, you might see higher counts).

So to answer your questions:

  1. The limits described above apply in the following ways: if it takes > 5 minutes to schedule all the workflows, the schedule API workflow action will fail with a “too busy” error. Items scheduled prior to that happening will still run. If it takes more than 5 minutes to load the initial list (ie, a very slow search), nothing will get scheduled. And the max search length and max item list length constrain how long the source list can be.

  2. There are no hard limits on S, but in practice the 5 minute constraint will apply. I’ve never seen workflows with thousands of steps, and it’s possible things will break / you’ll run into weird bugs if you try to do that.

  3. This is the 5 minute limit mentioned above.

  4. Once a workflow is scheduled to run, it is independent from its peers. So if you schedule a list of a 100 to run, that initial scheduling action is a single operation subject to the 5 minute limit, but as soon as the item run gets scheduled, it’s now totally separate from the other 99 items… Bubble doesn’t consider them related or track limits across them.

  5. The interval determines how far apart Bubble attempts to start each workflow. So if the interval is 1 second, Bubble will try to start each workflow 1 second apart. (Note that Bubble doesn’t strictly guarantee exactly when things will run… this is a best effort attempt, though usually if your app has capacity, it’ll run within a second or two of the target start time. We do guarantee it won’t run too early). If interval is left blank, we default to 2 seconds. In light of the new pricing / capacity management, we reduced the minimum interval to 0.2 seconds, so if you type ‘0’, you’ll actually get 0.2. With the new pricing, it is significantly less important for protecting Bubble’s infrastructure. However, it is still useful to help users manage their own capacity consumption; spreading out the scheduled workflows avoids burning all your apps’ capacity on running them. (We’ve taken other measures to help with this; for instance, if your app is low on capacity, we will wait to start scheduled workflows until it recovers, and we also put some limits on the number of workflows any one app can be running in parallel).

  6. No, I don’t think this introduces any changes to the above.

8 Likes

Hi @josh,

Thank you very much for your answer. Although I read it as soon as you posted it, I did not have time to properly thank you for it.

A few thoughts:

  1. May I suggest that these concepts are incorporated into the “official” Bubble documentation? Most times it is difficult to find nuggets like this in the forum.

  2. On my original point 6, nesting: Nesting is a required feature in many situations but it allows me to schedule a large number of workflows in a short period of time by using it with no thought. Based on what you explained, I imagine that having a large number of workflows scheduled to run too close to each other and with complex queries inside would make them take longer and it could trigger Bubble to drop some if they are not properly spaced and the execution starts taking too long. Even though it may be obvious to most people, it may be worth warning in the documentation about this possibility. Frankly, my preference would be that Bubble slows the offending workflows to a crawl and not drop them – but that may not be easy to do from your side.

  3. It would be very useful if each workflow had access to 3 read only variables:

  • the number of items in the original list
  • the number of items already processed by the scheduled batch of workflows
  • the number of this workflow in the list
    This would be helpful to those who are trying to put logic dependent on whether this workflow is processing the first or the last element of the list. Right now, I’m doing a bit of a kludge to accomplish this (put it as a PS at the end of this message).
  1. It may be very useful to have the ability to cancel a runaway workflow once it started running.

Be well,

Alex

PS:
For those who care, you could do the following to know if the workflow is processing the last element in a list

a) Create a table called WFC (for workflow Control) with at least two fields: ListSize and ItemNumber.
b) Modify the workflow to accept a parameter (e.g. wfc) of type WFC
c) Add a step before scheduling the workflow to create a new thing (a WFC) setting ItemNumber to 0 and ListSize to the count of all the elements of the list that you will pass to the workflow when you invoke it (effectively running the same query twice, one now with a :count at its end and the second time when you invoke the workflow)
d) Schedule the workflow, passing the result of the previous step as the wfc parameter (or whatever you called it in step b).

e) As the first step in the workflow, increment the value of wfc’s ItemNumber by 1
f) When appropriate (I usually do it at the end of the workflow) put steps that are executed “only when” wfc’s ItemNumber" is wfc’s ListSize. This is what tells you that you are in the last element of the list.

g) you may want to cleanup WFC at this point of periodically because otherwise it will grow forever.

Hope this helps!

1 Like

This topic was automatically closed after 70 days. New replies are no longer allowed.