RE: Stuck on this: How to access thousands of DB records and put into JSON?

Hey @gf_wolfer,

Just curious about the approach you’re currently using for this issue. I noticed that thread was created before the announcement of the list :format as text feature. It seems that feature coupled with a recursive workflow might be the best approach (as far as capacity and scalability go). At least, that’s the approach I was planning take, but I was wondering how things have been working out for you.

-Steve

Are you trying to download a lot of data to the browser? Or just place it in an API request server side?

I ended up staying with my original setup as the main constraint and issue is simply downloading/accessing in 2,000+ data items from Bubble servers. So it runs on page kind of in the background while the User performs other actions, so my setup works for my use case for now. The :formatted as text feature can help me format the json the way I want and I use it in other places for this, but I still need to Do a Search For all the records before I can split the records into groups of 500.

Definitely still interested in learning how to best work with large data sets though so happy to see it being discussed again.

For the Recursive Workflow idea I am not sure if that would help as I think you would still need to first download the entire data list on the first workflow so that you can then split it up into groups with :filter (if we are talking about a browser action instead of server side). If you could paginate a Do a Search For I think this way would help though. So maybe using the data API, which I think has a 100 items per page, would help

Perhaps I’m misunderstanding, but sending dynamically generated JSON to the Postmark API seems ideally suited to a recursive workflow, which could operate in the background on a data set of any size. There would be no need to retrieve the entire data set client-side, and it would be much easier on capacity.

I’ve already built a batch processor in pure Bubble, which can iterate over a list of things of arbitrary size and complexity. I currently use it to dump data to a CSV file for thousands of records which pull data from a number of related tables. At its core, it’s based on recursive workflows.

The primary impetus for creating it was the fact that this other approach falls flat on its face due to the hard-coded 30 second timeout.

However, I also use the batch processor to automate a nightly “data flattening” process, which essentially transforms and caches complex data structures into a “flatter” format in a dedicated table, making it more suitable for “interactive” visualizations (charts and graphs).

BTW, in your case, you should be able to get the total count to display client-side but do the retrieval and data processing in the background.

I’ll give my batch processor a try with the Postmark API when I get a chance and post back with the results.

-Steve

2 Likes

What if a user saved a 10,000 person contact list as a JSON string and saved a single dB entry? It couldn’t be worked with natively but effectively the search would be 1 entry only though a larger entry but much less searching and batching of results, right?

This is an approach that Backendless recommends for some users for certain data actions.

This Effectively becomes a noSQL format of data storage without a way to easily search/filter.

What about Firebase/firestore for records that large? It would be less than a 1 second return for data but then you must process and manage all the data calls yourself.

Are you suggesting to “pre-construct” the JSON output for each contact? That seems super inflexible to me. What happens when the template is updated? Recursion seems far superior in terms of scalability and flexibility.

Or am I misunderstanding what you’re saying?

No. It’s inflexible, you’re right.

But, for example, the scheduling software my work uses hasn’t changed formatting for years now for contact info. It’s always been the same, at least on the front end. So static isn’t always bad if you got it right the first time (or spend time to make sure you got what you need. That’s obviously not always possible. )

Agreed that recursive makes sense for x number of Actions, I just hadn’t come up with an efficient way to access thousands of DB records at once, which it sounds like this can do. Would be more interested in what exactly a batch processor is and how it works

In this case, though (if I understand correctly), Postmark limits the number of emails sent at a time to 500, so there must be some mechanism for “batching” (or pagination or whatever you want to call it) anyway. My batch processor just “churns through the list” in the background. The batch size and processing interval are also configurable, thereby making it somewhat “tunable”. I also implemented “queuing” based on a cooperative locking mechanism so that multiple batch processes aren’t trying to modify records in the same table at the same time. Anyway, it works quite well for its current uses, but I haven’t tried it yet with the Postmark API.

Recursion is your friend. :wink:

2 Likes

Just to be clear, recursion would not process the entire data set “at once”. In fact, the whole idea is to process the data set in smaller, more “digestible” chunks (batches), thereby eliminating capacity spikes. To do this, you must keep track of which entries have been processed and which haven’t, eventually working through the entire data set.

If all of the needed information resides in a single data type (table), then things proceed rather quickly. It will take longer the more “look-ups” (searches) that must be done (data there is to fetch) from other related data types. That’s generally not a big deal, since the process happens in the background. Another nice thing is that you can “know” when the process is complete (all items have been processed) and take appropriate action (notify the user, kick off another process, whatever).

A decent overview of the basic mechanism is provided in the docs. I just abstracted it somewhat, creating a special data type and reusable element to manage the various batch processes.

-Steve

1 Like

Thanks for some clarification, ‘at once’ was poor choice of wording by me.

Do you have an example of the Workflow Actions you could share?
Or possibly create this setup using our Contacts to Postmark use case? :metal:

Is this using Do a Search for:items until #?
For some reason I thought that method still retrieved all the rows of that search, 2,000 contacts in our example, in the Search and then filtered items to reference the first 500. So capacity wise it still does a search for 2,000 records. Or maybe that is only for on page searches or maybe it has changed since it’s been a while since I tested this stuff

Exactly. That’s how I control batch size.

Nope. In my experience, it’s a bit like the MySQL LIMIT clause.

I’ve been thinking about putting together a tutorial on looping in Bubble. For some reason, there seems to be a misconception that you can’t iterate over a list of things, but you can - both server- and client-side.

-Steve

4 Likes

This is great to know, looks like I’ve got some cleaning up to do in my app :raised_hands:

2 Likes

Hi Steve… quick question for you. How many records do you typically use as your batch size? I have been using between 50-100 and was curious if you had a point of view on what is optimal (understanding that the complexity of the data dictates). When I factor in the forced delays in the process (between iterations and batches), I tend to average around 1.5 seconds of processing time per record. Looking to speed things up given the amount of data I am processing and thought increasing the batch size might be an angle. Any thoughts on this would be appreciated.

Thank you!

Ken

Hi, Ken. Honestly, the best advice I can give is to experiment. As you hinted, it depends entirely on the data structure, the kind of “processing” being done, and how well-designed the schema and workflows are.

The Bubble docs mention 500 per batch, but that example is changing just one field in a single table. All one can do is bump up the batch size, and tweak the processing interval, and see. The optimal balance is specific to each use case.

Thanks for the quick response - especially over the weekend. All makes sense - wanted to make sure I wasn’t missing something obvious. Documentation is fairly thin when it comes to large batch processes. I did some experimenting over the weekend and am getting close to the right balance of processing time and capacity utilization. Will keep at it. Thanks again!

Ken