External DB has 100,000-1M users that need to be transferred to Bubble.
There are 3 important datatypes that need to be linked together
For each ‘external user’ there is a
MEMBER: with 2 mandatory relations to a ‘PARTNER’ and ‘USER’
USER: with 1 mandatory relation to a ‘MEMBER’ (This is the default bubble user data type)
So at least 1 of these records has to exist before the other on import. I opted for MEMBER as this is a direct copy of the external db, where as the USER is used only to allow the bubble magic link authentication process to work.
Anyway, I’m experimenting with what would be the most efficient and stable for importing this records so that the relationships were create
Method 1: CSV upload and subsequent modify
Upload 100k MEMBER records
Upload 100k USER with same spreadsheet
Modify 100,000k members to connect member to user (first, export the members to get the uniqueID)
EmailAddress to find user
Method 2: Run API workflow on a list
Workflow to get list (not batched)
Workflow to process each record
Pretty sure this would break if there number in that list is >20,000
Method 3: Run API workflow on a list from item N until item N, recursively
Workflow to get list (batched by item number)
Workflow to process each record
Can run concurrently
Method 4: Run API workflow on a list, selecting batch number +1 until batch number + final batch number
Workflow to get list (batched by hardcoded batch number)
Workflow to process each record
Can be run concurrently,
Method 5: Run API workflow and select first item where status = pending
Recursive workflow to process each record
I see this taking far too long as it cannot be run concurrently Edit: I suppose it could if the criteria was status = pending & batch number = N. As long as the recursive wfs don’t run against the same batch number
Method 6: DB trigger
DB trigger on new member created.
What happens when members added in Bulk?
Notes:
Would you update a log table to +1 each record or log something to keep track/use as a reference
IMO, option #5 is going to be the most reliable, but yes, it’s going to take 2 million years to finish.
I was recently playing around with something like option #3. It worked pretty well, but each flow that was running in the list was pretty simple and only ran 10 things from the list at a time. My problem with this approach was I didn’t know when to re-call the workflow on a list. I set it to current date/time + 40 seconds. This worked because it was a lightweight call.
If I had more time, I would probably create another temporary data thing called batch which tracks each workflow run on a list. Each time the workflow ran on a list item, it would add that thing to the batch record. Once the batch record count of processed items = total records added for the batch, I’d run the loop.
If it’s an SQL server, I would do a simple select statement and a server side create loop. Now I would still do these in remember batches like 1000 though.
Then when done, start the process over. I don’t think I would want to do each one completely individually.
As you already know the relations beforehand, you can have a field/row on the MEMBER list → to which user they belong (added externally). Then, only import the User table.
Upon user login, run workflow along with the login or mask it with some loading screen → find MEMBERS that should belong to this user and add them to Current User’s field on the spot. This should be lightweight and spread over time as users won’t be logging in at the same time and you only have to run 1 workflow for each user upon login.
You can also have a field on the User isOnboarded to switch to YES and have this as a condition for the above workflow, so you don’t update their Members on every login.
I was trying to work on this, but ran out of time… FYI, I uploaded a 200,000 row CSV file to Supabase (created the table and data), and it took less than 1 minute…
Matching always uses “do a search for” so it does not really matter what it is comparing. As long as it finds something. It always compares text strings.
Bubble can find 100 out of 1 000 000 things pretty quick if that’s the only thing happening at the moment. Then the 100 results are already there so you just put/save them to the User’s Members field.