Best approach for importing 100k-1M users

Here’s my scenario/example

I will log my results as wel

External DB has 100,000-1M users that need to be transferred to Bubble.

There are 3 important datatypes that need to be linked together

For each ‘external user’ there is a

  1. MEMBER: with 2 mandatory relations to a ‘PARTNER’ and ‘USER’
  2. USER: with 1 mandatory relation to a ‘MEMBER’ (This is the default bubble user data type)

So at least 1 of these records has to exist before the other on import. I opted for MEMBER as this is a direct copy of the external db, where as the USER is used only to allow the bubble magic link authentication process to work.

Anyway, I’m experimenting with what would be the most efficient and stable for importing this records so that the relationships were create

Method 1: CSV upload and subsequent modify

  1. Upload 100k MEMBER records

  2. Upload 100k USER with same spreadsheet

  3. Modify 100,000k members to connect member to user (first, export the members to get the uniqueID)

  4. EmailAddress to find user

Method 2: Run API workflow on a list

  1. Workflow to get list (not batched)
  2. Workflow to process each record

Pretty sure this would break if there number in that list is >20,000

Method 3: Run API workflow on a list from item N until item N, recursively

  1. Workflow to get list (batched by item number)
  2. Workflow to process each record

Can run concurrently

Method 4: Run API workflow on a list, selecting batch number +1 until batch number + final batch number

  1. Workflow to get list (batched by hardcoded batch number)
  2. Workflow to process each record

Can be run concurrently,

Method 5: Run API workflow and select first item where status = pending

  1. Recursive workflow to process each record

I see this taking far too long as it cannot be run concurrently Edit: I suppose it could if the criteria was status = pending & batch number = N. As long as the recursive wfs don’t run against the same batch number

Method 6: DB trigger

  1. DB trigger on new member created.

What happens when members added in Bulk?

Notes:

  1. Would you update a log table to +1 each record or log something to keep track/use as a reference
3 Likes

Following this thread. This is one of my questions as well. :smiley:

IMO, option #5 is going to be the most reliable, but yes, it’s going to take 2 million years to finish.

I was recently playing around with something like option #3. It worked pretty well, but each flow that was running in the list was pretty simple and only ran 10 things from the list at a time. My problem with this approach was I didn’t know when to re-call the workflow on a list. I set it to current date/time + 40 seconds. This worked because it was a lightweight call.

If I had more time, I would probably create another temporary data thing called batch which tracks each workflow run on a list. Each time the workflow ran on a list item, it would add that thing to the batch record. Once the batch record count of processed items = total records added for the batch, I’d run the loop.

Looking forward to seeing what you find!

If it’s an SQL server, I would do a simple select statement and a server side create loop. Now I would still do these in remember batches like 1000 though.

Then when done, start the process over. I don’t think I would want to do each one completely individually.

Are users expected to login at some point?

As you already know the relations beforehand, you can have a field/row on the MEMBER list → to which user they belong (added externally). Then, only import the User table.

Upon user login, run workflow along with the login or mask it with some loading screen → find MEMBERS that should belong to this user and add them to Current User’s field on the spot. This should be lightweight and spread over time as users won’t be logging in at the same time and you only have to run 1 workflow for each user upon login.

You can also have a field on the User isOnboarded to switch to YES and have this as a condition for the above workflow, so you don’t update their Members on every login.

4 Likes

Where the data-hoarders at? @NigelG used to break Bubble doing these types of things. He might have some insights.

I was trying to work on this, but ran out of time… FYI, I uploaded a 200,000 row CSV file to Supabase (created the table and data), and it took less than 1 minute…

2 Likes

You know what

this is the most practical solution for my use case.

I’m still going to run some tests to see it’s speed.

I’m sure it can run quicker when the relationship field is matched on unique id vs another field for example

Using the dataAPI you can do it drastically faster to create the records.

2 Likes

Matching always uses “do a search for” so it does not really matter what it is comparing. As long as it finds something. It always compares text strings.

Bubble can find 100 out of 1 000 000 things pretty quick if that’s the only thing happening at the moment. Then the 100 results are already there so you just put/save them to the User’s Members field.