Discussion needed: How to generate unique and sequential integer IDs for records

Using slugs won’t scale because every time you force Bubble to increment the slug ID (e.g.: slug-100, slug-101, slug-102), it consumes more WU.

However, I noticed that Bubble’s scheduler seems to use unique and incrementing integers.

Imagine a database with this single record:

Unique ID: 111x982
Scheduled Workflow ID: 482842
Sequence ID: 1000

Then you create two more records. On the record creation workflow, you add two steps:

  1. Schedule API workflow that will do something with this newly created record, maybe a second into the future
  2. Save the API workflow ID to the record

So if we do that twice, we get this database:

First record
Unique ID: 111x982
Scheduled Workflow ID: 482842
Sequence ID: 1000

Second record
Unique ID: 111x881
Scheduled Workflow ID: 485112
Sequence ID: null

Third record
Unique ID: 111x592
Scheduled Workflow ID: 485117
Sequence ID: null

Note that the Unique IDs are arbitrary and so are the scheduled workflow IDs. The only thing that matters is that the Scheduled Workflow IDs are sequential and unique.

Then the scheduled workflow does this:

  1. Search for the first record that has a Sequence ID of null, ordered by Scheduled Workflow ID ascending.
  2. Does that returned record have the same Scheduled Workflow ID as the record in the scheduled workflow?
  3. If so, then save the Sequence ID in the scheduled workflow record as +1 to the record found in step 1.
  4. If not, reschedule itself and save the new Scheduled Workflow ID to itself.

Using our sample database:

  • If record 2’s scheduled workflow runs before record 3’s, then the Sequence ID will be set to 1,001. When record 3’s scheduled workflow runs, it will set its Sequence ID to 1,002.

  • If record 3’s scheduled workflow runs before record 2, then it will reschedule itself until record 2’s scheduled workflow runs.

Assumptions:

  1. Bubble’s scheduler returns a unique and sequential integer.
  2. Bubble’s scheduled workflows are guaranteed to run.

Safeguards:

  1. In the case something goes wrong, you don’t want to infinitely have scheduled workflows rescheduling themselves every second. You could put a reschedule counter that auto-terminates itself and notifies you.

I haven’t tested this, but before I put in time, does anyone see why this wouldn’t work? I was up late last night watching hockey, so not totally sharp today.

Yes, I know this consumes WUs, but it should be consistent WUs and maybe worth it if having unique sequential IDs matters. E.g.: generating invoice numbers.

@mikeloc @georgecollier @lindsay_knowcode @chris.williamson1996 @and other smart bubblers I’ve forgotten

1 Like

Throwing in an alternative I’ve used (if I get your use case), that still consume WUs:

Have an Int field on the Datatype.

In the BE search for the highest one in the DB and add 1 to it for the new record and create the record. Then check that there is only 1 record with that new number. If not rerun the loop.

// Peter

1 Like

I like it because it’s simpler but does that guarantee uniqueness?

No, it doesn’t…

This will always be subject to race-conditions… (with the potential for collisions and duplicates).

Using Slugs won’t (as far as I can tell), as it’s the only part of the Bubble database that (presumably) uses some kind of database locking, which is probably why it’s has a higher WU cost than any other database operation.

How does using Scheduled WF IDs provide sequential numbers? There could be hundreds, or even thousands, between each WF ID, no?

2 Likes

We’re using the returned scheduled workflow ID to order a query.

Actually what’s the right word.

100
101
102

Is sequential

100
103
456

Is not sequential but it is ordered? That’s the concept I’m working with.

Just use a databse trigger to run as soon as a new item is created, that counts all similar items created before the item itselft. This will ensure assertiveness and sequentiality

Ah, ok.. so you’re just using the WF id to order the Items in the DB (instead of using their created date timestamps), and then go back and generate the sequential ID afterwards?

Well that will still be just as subject to race conditions as any other method that relies on reading the DB without any kind of locking.

1 Like

But not uniqueness…

Why not!?
The only problem I see is if an item is deleted. :melting_face:

Item 1 is created:
ID = all items created before:count
ID = 0

Item 2 is created:
ID = all items created before:count
ID = 1

Item 3 is created:
ID = all items created before:count
ID = 2

Remember that this ID should be filled by a database trigger, triggered after the item is created.
Maybe I have missunderstood something?

1 Like

Yep makes sense. Thanks for helping me think it through

I think slugs with work for what I’m trying to do. They get expensive when you get to high numbers, which won’t be the case for me

Here’s a simple post that covers it:

https://stackoverflow.com/questions/3325458/which-locking-scheme-and-isolation-level-should-one-use-for-sequence-number-gene


Anything you do within transaction scope is subject to race conditions.

So any SQL query you do to get the last used value, increment it, and store it in a new row means that two concurrent clients could fetch the same value and try to use it, resulting in a duplicate key.

That’s the core problem withv my idea and yours. Bubble is abstracting queries to make it easier but has the same core limits.

OK.. I tried it, and it seems to work pretty well (I haven’t been able to get a duplicate or missed number over 2 thousand attempts)… so this does seem like a pretty good method (although I’m still inclined to think it’s subject to race conditions, but haven’t actually hit any in testing).

And, as you say, if items get deleted, then you end up with duplicates.

1 Like

You didn’t understand my solution.

Bubble has a feature that lets you trigger a workflow after a change in the database. This change could be, for example, the creation of a new row.

So, AFTER the row is created in the database, you would trigger a workflow to count how many rows were created on a date/time before that. This way, it is impossible for you to have duplicate or out-of-order items, because the creation of the record already consumes the fact. The past is immutable.

The trick is to create this ID after the creation of the record. Not before.

So, even if thousands of people create these records together, they will be included in the database at different times, sequentially. One after the other.

That’s for sure Adam… I’ve also tested this method a few times, creating multiple items at the same time using API and I’ve never encountered a problem… I don’t know if two items can be created at exactly the same time or if item creation is always done sequentially.

The ONLY problem I see is the impossibility to remove the item from the database. But you can marked it as “deleted”…

What about look to the last ID created before and add 1? This will solve the issue when deleting records because we are always looking to the last one. But I think I already tried this in the past and got race conditions when creating items in batch.

Yeah, that will definitely run into race conditions (I just tried that, plus a few variations, and ended up with multiple duplicates).

Generally, any time you’re reading something from the database, modifying it, then writing it back, you’re going to be subject to race conditions when there are multiple updates being done in close succession (which is why I’m surprised that using the count method above seems to work - but it does, at least in all my testing so far).

1 Like

I know keeping it native is great and all but after my bouts of frustration with this issue in the past i decided to use Cloudflare Durable Objects to track sequences. I just call it’s endpoint, it upticks it’s state and then passes back the value to my Bubble app.

Clean, no WU and guaranteed sequential.

2 Likes

Unless Bubble is doing something special with database triggers, or unless I’m misunderstanding you (very possible), your solution still depends on the timing of database operations.

  1. Write to database
  2. Database trigger fired
  3. Read from database: query how many records were inserted prior to step 2
  4. Use that number to update the database trigger record’s sequential ID.

It just takes step 1 to happen twice before step 3, which could happen when two records are saved simultaneously, to break the logic.

If there were 10 records prior to step 3, then step 3 would calculate 12 as the next sequence ID.

Maybe the next database trigger fires at the same time and you have duplicate incorrect sequence numbers, or maybe you wind up skipping one (12, then 13, but missing 11).

What I’m not understanding is how your solution necessarily eliminates the race condition problem. Your steps are not guaranteed to happen in that order when concurrent database operations are happening, which means that you can’t guarantee either uniqueness or an unbroken sequence.

I think this explains why slugs are so expensive to use. Bubble guarantees uniqueness of slugs, so when it autoincrements the -n suffix on duplicate slugs, it’s doing something that deals with the race condition problem.

Not trying to trash your suggestion! Just thinking it through and trying to understand.

Doesn’t account for deleted records. I’m probably missing something as it seems like the simple solution is efficient and works well. Add an ID number field to the dataset. Create a dummy record for the dataset with id 10001 or the like. For every new record, look for the last created record and increment the ID By one. The only other considerations are bulk-creating records and deleting the most recent record which can be handled in a variety of ways…

why do any math? the only thing that is needed is sequential and unique, so just use the current date/time as a number of milliseconds as the ID…they will be unique and sequential since, if the approach to use backend database triggers is implemented it likely won’t have any race conditions (which means all created dates by milliseconds are different = unique).