Stuck on this: How to access thousands of DB records and put into JSON?

gf_wolfer · August 7, 2020, 2:41pm

Hey all, I have an app where my customers have list of contacts (500 to 5,000 records). I want to allow them to send an email blast to them and to do this I need to convert the contacts to JSON and then send to Postmark (could be another service)

Currently I use the Toolbox plugin which loads that records and builds the JSON on page load, which seems to work alright for under 1,000 contacts, but it takes a long time for larger lists and uses a lot of capacity.

Any ideas on the most efficient way to find thousands of DB records, convert them to JSON and then send the JSON(s) in an API call?

keith · August 7, 2020, 4:21pm

Are you doing this on the server side, or in the page? And, in doing this, are you doing it in a single workflow or doing it using “workflow on a list” or using recursive workflows?

(Because there’s no reason you can’t do this in a single workflow, which would be the most obvious optimization, if you hadn’t done that yet.)

Even though Run JavaScript isn’t the most efficient thing in the world (because it uses eval()), the real performance bottleneck here is sourcing the Contacts (the Search that complies the list of Contacts).

I have more to say about this based on some recent experiments.

keith · August 7, 2020, 4:50pm

And… actually, as I think about this, Run JS actually could be a very poor-performing way to do what you’re trying to do as you don’t really have proper ways to deal with Lists and Things, like you do inside of a plugin.

But, if you can answer my previous questions, I can provide more guidance.

gf_wolfer · August 7, 2020, 5:27pm

I am flexible with all of it - end goal is just my User can click ‘Send Email to All Contacts’ and then my app sends JSONs of the contact data to an API - the JSONs array is limited to 500 contacts which might make it tricky and would require up to 10 api calls for 5,000 contacts.

I can do it server side or on page and I can do it anyway that makes sense from a tech standpoint.
On page (might) have the slight benefit of telling the user how many Contacts the email will send to before they click Send, but that’s not a necessity.

gf_wolfer · August 7, 2020, 5:31pm

Right now, I run the following workflows based on how many contacts are loaded on the page, which takes a lot of capacity to load 2,000+ contacts on the page, but I think the actual JSON creation isn’t too heavy an operation

lottemint.md · August 7, 2020, 5:32pm

There is a way to convert a massive list of Bubble Things to JSON Array.

Check out the attached video, please:
//s3.amazonaws.com/appforest_uf/f1596821530700x807607990335294600/things_to_json.mp4

Note: Ideally, you need to create a pagination workflow for huge lists.

1000 - takes ~3 seconds
5000 - takes ~20 seconds
10000 - takes ~45 seconds
100000 - takes ~1 min 41 seconds

It will take more time if you set more columns.

gf_wolfer · August 7, 2020, 5:36pm

Interesting, this may help in loading all the contact data. But I would still need to take those JSON values and reformat it to fit the Postmark API JSON structure - which I am sure there is a way, just need to find it

lottemint.md · August 7, 2020, 5:49pm

Using that way, you can change the output keys’ names, but it doesn’t support nested objects.
I can publish this plugin as a private one for now. Please DM your email, and I’ll provide you access.

keith · August 7, 2020, 6:18pm

Hey @gf_wolfer, how many fields does each JSON object have? I’m assuming it might be like:

{“name”: “Joe Smith”, “email”: “joe@example.com”}

but is there more to it? (I’m asking because I want to create the exact scenario here.)

gf_wolfer · August 7, 2020, 6:54pm

Off the top of my head I think Postmark requires an array that can have all the Template fields you set in your Postmark Template - so you can send a bulk email with custom info in it for every Contact (unique name, User links, etc)

So more like:

{“templatemodelid”: “123”, “FromName”:“Keith’s Email”, “name”: “Joe Smith”, “email”: “joe@example.com”, “messagebody”:“Hello Joe, this is the email message”},
{“templatemodelid”: “123”, “FromName”:“Keith’s Email”, “name”: “Frank Doe”, “email”: “frank@example.com”, “messagebody”:“Hello Frank, this is the email message”}

keith · August 7, 2020, 7:02pm

Ah, gotcha. Would you mind sharing the script you run in your Run JS with me?

(If u don’t want to post it here, you can DM it to me.)

keith · August 7, 2020, 7:06pm

BTW, I find on a personal plan that I can load and process 5000 contact-like objects in about 8-9 seconds (in a test app that isn’t, of course, doing lots of other things!).

How long is “a long time”?

gf_wolfer · August 7, 2020, 7:17pm

It’s actually not the time to load as it is the load on the app’s capacity. This is live in production, overall the app runs well but you can see where I was ONE user going to this page to send an email (infrequent usage, but still). If multiple users use that feature at once it is timing out.

gf_wolfer · August 7, 2020, 7:19pm

I am using the Toolbox plugin to convert a Data Type List to a List of Texts. Then magic is in this:
Screenshot_080720_032031_PM

But now that I just looked it up I am seeing errors from the inspector. So something recently broke it . It was working for 5 or so months before this. Edit - seems my test account for this somehow wasn’t working, still working for my Users

The xMESSAGEhere is some Find and Replace work that I then run when the workflow happens (there was a reason that I can’t remember right now)

keith · August 8, 2020, 1:22am

List Item Expression?

Anyway, since you’re doing this stuff in the page, note that the only thing using capacity here is any Searches that compile the information needed to ultimately feed your scheduled API workflows that cause your emails to be sent.

And there’s really no way around having to get the info on (what I assume) is your users’ Contacts.

(And I’m sure you know this, but just to be very clear, nothing that you do in the page – like running JavaScript, doing find and replace on text strings that are already in your page, using in page plugins, etc., etc. – consumes any capacity at all. Capacity is only server-side CPU usage.

Search operations, of course, are not “in the page” – they require interaction with the server and consume capacity. And of course your eventual API workflows that cause the emails to be sent are server-side, but these are probably not very CPU hungry as their just pinging Postmark.)

Now, it’s true that Searches that return a large number of items can noticeably consume capacity, as your log screenshot shows.

Here’s an example from my own test app where I’m loading large numbers of Things (in this case, an object called a Fave Thing, which has a Name text field with about 100 characters and a Price number field on it). You can see a big difference between a Search where we grab the first 5000 items versus a Search where we grab the first 30,000 items (and timewise, loading is similarly scaled – 5000 items load in around 8-12 seconds typically, but the 30000 items can take anywhere from about 80 to 120 seconds:

So the only thing you can really do to improve the performance of your page is to improve the source Search or Searches and (to a lesser extent) control the amount of data that gets fetched to the page.

A couple of things to think about there (I’m not sure that any of these might help as I suspect) :

Is the source for the info you need to send your emails sourced from just a single object (e.g., a Contact)? Or do you have to do multiple large searches for different objects? (I’m guessing from you graph that there’s really just one type of thing being fetched. But if you’re having to Search across multiple Things to assemble this information, it would probably be better to consolidate those data fields onto a single Thing. Again, I don’t think this is an issue for you, but FYI.)
Does that object have have a bunch of stuff on it that you don’t need for sending these emails? We obviously need the recipient name and their email address. But does the Contact object have a bunch of other stuff like a geographic address, an image, a notes field that might have several K of text on it, list type fields with a bunch of other stuff, etc.?

Note that this shouldn’t really be a problem, but could be. (In most cases, if we don’t reference a Thing’s field, we don’t fetch the value in that field and we don’t cause a database interaction.) BUT if the Contact object is complex and has many fields, it may be possible that you may get better performance by – instead of Searching for and returning the Thing – you instead Search for the Thing’s fields that you need. Instead of (for example):

Search for Contacts

Try Search for Contact's Name and Search for Contact's Email and then you’ve got parallel lists of texts (what this does is destructures the Thing on the server). This would usually be something I wouldn’t advise, but my testing shows that, done in quick succession, this takes pretty much the same time as Searching for the Thing itself and then destructuring in the page.
But I’ve only tested this on very large numbers of Things where the Thing is simple (as in my example above where I’ve got a Fave Thing with only two fields on it. If the Thing is more complex and has a lot more fields the “destructure on the server” approach may be faster. It’s not clear to me whether this uses more capacity or not.

It’s worth experimenting with. Here’s something I tried. I’ve got these Fave Thing objects and I tried the destructure-on-server approach (where we simultaneously Search for Fave Thing’s Name and Search for Fave Things’s Price) 3 times and the Search directly for Things approach three times over 20,000 items. Here are the results:

destructured 1: 39 s to return Names + 9 add’l ms to return Prices
destructured 2: 40 s to return Names + 9 ms add’l ms to return
destructured 3: 50 s + 15 ms

search for Things 1: 30 s to return Fave Things + 8 ms to destructure into Names + 7 ms to destructure into Prices
search for Things 2: 31 s + 7 ms + 6 ms
search for Things 3: 33 s + 15 ms + 6 ms

Directly searching for the Things (and then destructuring in the page) does look like it was generally faster. However, it’s possible that the capacity impact was greater. Here’s what I see in my capacity chart:

Kind of interesting, right? (Because of the granularity here it’s hard to tell is the simultaneous searches for field values are using less instantaneous capacity, but spread out over a longer timeframe, or if indeed they impact capacity slightly less.)

One thing for sure: Since loading (doing the Search for) Contacts is so impactful to capacity, you should not do this automatically on Page Load. What if your user visits the page accidentally? What if they are just futzing around and have no intention of actually sending the email now?

You don’t need to fetch the Contacts until the user commits to sending the email, right? So only load it then. Don’t waste capacity that you don’t absolutely have to use.

Another suggestion would be, to make this experimentation easier, you might want to install List Shifter and use that as your combination data loader and storage. Yes, you can use a headless repeating group or a custom state, but List Shifter is just a lot more convenient (and also it throws an Initialized/Updated event when it’s finished loading, so it makes benchmarking more reliable). Additionally, even if you don’t want to use List Shifter, this plugin includes my Debug Buddy action plugin that allows you to benchmark the time between different events (and this is how I’m able to tell you above how long such-and-such an operation takes).

I’m actually working on a server side version of List Shifter’s Process List action, which would give another route for doing stuff like this which might be lower-impact capacity-wise. (Even on hobby accounts, it seems to be able to import a list of about 13,500 and you get a few thousand more on a Personal plan.) For 5000 Things, it takes about 20 seconds to receive the input list and then about 6.8 seconds to do a basic process on the list (like transform that list into something JSON like). For 10,000 Things time to execution is about 50 seconds and about 12 seconds to run a simple process on those 10K items.

(Which just shows us again how stingy Bubble is with compute, right? The same processing part takes just milliseconds on a decent PC in the browser.) But peak capacity impact is quite a bit lower:

the part on the right is me running a 5K followed by 10K Server Side Process List action.

Anyway, the point of that plugin is that you could do iterative stuff in backend workflows and, even though they are slower, for things like, “Hey, take this list of Contacts, process it and then pass this off to my email sending service” it’s fine.

At the end of the day, @gf_wolfer, I guess my point is that (1) even if your on-page stuff is written sub-optimally, that doesn’t impact capacity and even a slow/poorly-written script will execute orders of magnitude faster than the time it takes to actually return the results of a search for 1000, 5000, 10,000 items.

So I think the ultimate solution is “buy more capacity”… But as I have been fond of asking, “Exactly how much capacity do I have to buy before my Bubble compute feels like it’s running on a Raspberry Pi versus a Casio digital watch circa 1986?”

gf_wolfer · August 10, 2020, 1:58am

As usual, your level of detail and curiousity about these type of questions is amazing - thanks for the deep dive into this.

To answer a few questions.
Yes this is the List Item Expression

Yes the main annoyance is having to load 500 - 5,000 Contact data records before performing this operation, so thanks for the detail into different ways into evaluating the load time for this. I think being able to do it server side eventually would be what I want to do so the user experience is seamless and it only loads the Contact data when an Email Blast is actually ‘sent’

system · October 16, 2020, 2:41pm

This topic was automatically closed after 70 days. New replies are no longer allowed.