Tricks for queuing API workflows?

Anyone got any tips for creating a queue of API workflows that run in sequence and not simultaneously? My current use case is using AI to generate sections of a document. The generation uses the content of the previous sections to prompt the generation of the current section.

Users can generate each section one by one, and they should be able to generate multiple at a time (so if they click generate on sections A, B, and C, section A will be generated, then section B, then section C, rather than just running the API workflow as soon as it’s scheduled.

A recursive workflow only really works when the user knows how many sections they want to generate when they kick it off - they can’t come back two minutes later and add more section generations to the queue.

Interested to hear anyone’s workarounds or solutions. I guess I’m looking for a way to put an API workflow queue in a funnel so that only one can run at a time and the next one in the queue will start when the previous one finishes. I’m thinking of something like a ‘Queue’ data type that is searched for and, if found, scheduled to run when the current API workflow is complete?

Had this use-case before.

I implemented a Queue data type with workflow ID and status. Haven’t come with a better solution yet.

2 Likes

Does Bubble provide both the workflow ID and status of the workflow as a value or did you need to just capture the workflow ID and create your own status? I’m wondering if Bubble is providing some kind of status value similar to a progress percentage.

@georgecollier you can do something similar to what you are thinking and what @redvivi said. I often use the term Processor for my data type that I use to track backend workflows. I would say for this use case you can have a data field on your Queue that is a list field of the backend workflows that are ‘to run’ and another field that is ‘completed’ plus a 3rd for ‘total workflows’, which the total workflows are all workflows triggered, the ‘to run’ has only those that are remaining, and the ‘completed’ has all that are finished, so that you can compare against the ‘total workflows’ to ensure all were ‘completed’, but the ‘to run’ will allow a user to continuously add more as they wish.

2 Likes

Custom status.

1 Like

That’s not entirely true. While the recursive workflows you’re thinking of are more WU efficient because the iteration count is statically defined, you can have recursive workflows dynamically determine whether a next iteration is needed or not. For small lists, the WU difference is low and can be outweighed by the benefit of smarter recursion. This is exactly what you are talking about here:

The specific setup for dynamic recursion depends on your schema but mainly on how much you are allowing users to go berserk on queue management.
If you only want to let users add to a simple queue a few minutes later, then you don’t need anything too fancy, you don’t even need a new datatype; you could just iterate through a list in the datathing.
This would allow users to add to queue (add to the list) while the recursive workflow is iterating, as long as you also implement error-handling which reschedules the same iteration (or deletes queue) if something goes wrong. You also need a way to determine if a recursion is active on a particular datathing, so you know whether to initiate the recursive workflow or just add it to queue.

Lists are ordered and simple, but if you use them, the API workflow itself should not update the list, or you risk running into race conditions with the backend and frontend modifying the list simultaneosly. I would personally save the total iteration number on the datathing, and avoid removing completed iterations from the queue. You can also know use iteration = queue:count to determine whether the recursion has been completed. You could always reset both of these at a later date when the chances of the user being online and active are very low.

If you want a more complex queue, with status reporting/multi dependencies etc, then using a new datatype with individual entries for each queue item would be the best solution.

Edit: I was assuming you’re saving the response of the APIs to the database. But I now realise you might not be. In this case you could use Local storage to save which sections have been sent to be generated and determine from there whether a section can be sent for generation, or whether it has to wait for dependencies’ response.

I am :slight_smile: Thanks for the ideas guys, good to know I’m down the right track and there’s lots of good points I can consider when implementing. Will report back with any pros and cons once done.

2 Likes

Have you found a solid working solution for this situation?

Actually, just today.

My use case is as follows:

  • I have a file management system in my app.
  • I have a trigger set up so that when a File’s Parent File is changed, various other lists are updated + hierarchies calculated.
  • I have been adding a bulk move feature. That means that when I make changes to a list of Files to change their Parent File’s (i.e the folder they’re in), all the triggers run virtually simultaneously and some lists break due to race conditions.

So, I had to solve: how can I slow down the rate my DB trigger runs such that the actions don’t encounter race conditions?

High level overview: Have a queue data type, and a queue job option set. The queue job option set contains the amount of time that should be spaced between workflows of this type. In the trigger, schedule an API workflow to run at the Current date/time + (number of pending jobs * time to allow per job).

  1. Queue Job data type is created when a trigger is ran, and before a workflow is scheduled. Creating a Queue Job means 'I want to do something ASAP, but for whatever reason, it needs to be slightly spaced out

  1. Schedule an API workflow to run at a certain point in the future based on the number of pending (complete = no) items in the queue.

That schedule date expression gets a count for the number of pending queue items, multiplies that count by an option set attribute that says how many seconds we should allow per workflow of this type (I set 2 seconds for this option).


If I set the queueTimeSeconds in my option set to 5 seconds, and there are 10 pending queue items, it’ll schedule to run in 50 seconds.

This isn’t foolproof - if the workflows don’t run on time for some reason, you could run into some issues, but with the WU plans even scheduling loads of WFs still means everything runs pretty on time.

A modified approach is closer to what @boston85719 suggested, where you schedule a workflow if and only if there are no pending jobs. Using this method, in the relevant backend workflow, you need to schedule the next job once it’s complete. So, from nothing in the queue to adding 5 things, it would go:

  1. Schedule first API workflow (4 items remaining in queue)
  2. First API workflow schedules second one once it’s done (3 items remaining in queue) etc etc recursively until there’s nothing left in the queue.

Hope this helps.

Also, nobody pick on me for having both completed (yes/no) and completedDate (date) instead of just having completedDate and knowing that if completedDate is not empty it must also be completed. It’s 1am :slight_smile:

2 Likes

@hi_bubble here is a link to a forum post that indicates an approach when using the schedule backend workflow on a list function.

You can find other posts in that thread discussing the approach

The approach outlined in the posts allows you to run an action only after all items in the list have completed their processing, so you wouldn’t need to use a recursive backend workflow costing lots of WUs to schedule itself, as well as the costs of carrying over the parameter (a list of things) for each run of the recursive backend workflow. It also when used on schedule backend workflow on a list will not require additional data types or scheduling backend workflows for each item in the list, as instead it only schedules a single backend workflow when all items have completed.

The posts linked are a discussion on how to avoid race conditions when using the schedule backend workflow on a list, while avoiding the WU costs of recursive backend workflows.

@georgecollier similar to both your approach and @boston85719 I have a dataset queueWorker with 1) Queue Job Option and 2) “is On?” y/n and that’s it.

All the DB trigger has to do is check is the relevant queueWorker is on or not. There is only one QW per Queue Job Option (maybe also per user Company) so no records to create etc.
If it is on → do nothing. (the QW will find this record to process)
IF it’s off → CE to turn it on with the record that triggered the DB (helpful when there is often only one record to modify so cuts down on search)

A big advantage of this approach is the ability to stop all QWs that the current QW is a prereq for (first step of the QW to check if any of the QWs that use current QW as a prereq are on).