The Ultimate Guide to Bubble Performance - the new edition is out (now 210 pages!)

@petter Hi Petter! i bought the book today and skimmed through it, and it has some interesting topics in it! Great material for the community!
I have a 2 questions about the data structuring section, one about the DB structure and another about the ''Search Data Types" @anon10873777 also mentioned it before in this post.

  1. Data structure:
    image
    You are using a bidirectional relation, but it seems that when doing a LOOKUP, bubble gets ALL the data from the relation, causing the query to load lots of data. My approach would be to decouple them and only relate the data to the company. When you need to search for thecurrent company data, you just do

search for data WHERE company = this company

  1. Search data types

let’s use a totally hypotetical scenario: A car races website with info of cars and their races (lets suppose each car has only 1 race). My client needs to show certain data in a list (RG).
having the following data structure (Race and car_data are related to car but it’s not bidirectional, if so, if i want only cars, each car would search for all the data inside the related thing)

With the ‘search data type’ you are saying A) to unify the query fields in the main table, or B) to create a NEW table with duplicated fields (that get updated via a backend trigger)

If B is the answer:

Sorry if it’s too long, :sweat_smile:

Thank you again for the great book!

2 Likes

Hi @tgmoron

Thanks for the kind words about the book!

  1. Bubble only performs the lookup if needed. If the Thing is not displayed on the page, Bubble will download only its unique ID (which adds about 32 characters extra). So you don’t need to be afraid to connect Things in one or both directions - that adds very little to the overall data load.

I’m not sure I understand your scenario 100% here, but in general there are two reasons to use search data types:

The first is to make a search more performant by setting up a lightweight type in scenarios where the original data type has a high data weight (such as a blog post).

The second is to combine two or more different data types into one search (such as searching for restaurants, hotels, destinations, etc in one search bar on TripAdvisor).

There could be other uses of course, but these are the ones I outline in the book.

how would you create those backend triggers, would i have 1 workflow per field? (when car name changes, when introductory name changes, when race name changes, etc)

Well, a backend trigger will fire even if there’s no condition on it, so technically you can set it up to fire no matter what field is changed, and then simply copy all fields from the original to the search data type every time. There could be plenty of good reasons to set up multiple triggers depending on what they are supposed to do, but it’s not something you’ll need to do every time.

2 Likes

Hi, thank you for answering!

That means that for example if i have a CAR table with its data table related:

  • CAR TYPE
  • Car data (relation) [32bytes]
  • Car race (relation) [32bytes]

and a RG that brings the data as: ''Current Car’s > Car data > Country" == Output: Belgium [7 bytes]. Will only load that field’s 7 bytes of data or it’s going to Load all fields in “Car data” and then filter Country? I mean, when getting ‘Country’ key, does it also preload behind courtains every other key data? (for indexing faster later)

Do you have an example workflow of this? maybe an screenshot? is the ‘Search Type’ related to the original things in any way? (if i’m right, i understand that there is a searchType table with 1 search type per item, so if i have 1000 locations and 1000 restaurants, my search type table would have 1000 items in it)

Thanks !

Hey @tgmoron,

Whenever you reference one data point, even if just to get a Unique ID to do a lookup, Bubble downloads the whole thing. So in this case, yes, it will download the full record for Car Type, Car Data and finally the name of the Country. Privacy Rules will of course stop any private fields from showing up.

A couple of points though:

  1. Bubble have hinted that this is probably going to be made more efficient in the future. They haven’t provided a timeline or priority schedule for this, but it’s a move that makes sense.
  2. Keep in mind these are still pretty efficient queries: while I do advocate doing performance optimization, it’s also important to not get too mathematical about it and try to shave off every thousandth of a second you can identify.

That being said, it’s a good thing to be aware of how these choices affect your app. And then decide where you draw the line as to how much you’re willing to invest in optimizing.

Do you have an example workflow of this? maybe an screenshot? is the ‘Search Type’ related to the original things in any way? (if i’m right, i understand that there is a searchType table with 1 search type per item, so if i have 1000 locations and 1000 restaurants, my search type table would have 1000 items in it)

They are related in that they both directly reference each other via Unique ID’s and that they both contain information that’s synced from the “original” to the satellite data type. So If the original is a Car called “Ford Mustang II”, then that name is also saved as " Ford Mustang II" (text) on the satellite data type. So we’re actually storing data in multiple places (which traditional database practices would call a no-no), but the upside is that we don’t need to reference the name of the original when we’re displaying the satellite (like in your previous example).

1 Like

First of all, thank you for this guide! Super helpful, though I’m really just beginning to dig into it.

I had a question, if you’ve got a minute to offer your opinion. We have an app that helps users enter & track activities. We recently released a reporting feature that allows scheduled reports, which is entirely backend workflows that run either weekly or monthly.

With this release, we’re starting to see capacity issues on the personal plan when one of these scheduled workflows runs. It differs between users, but sometimes it’s running a report on 1 user’s activities, sometimes 8-10 users’ activities. The larger ones might have over 1,000 activities that the workflow is adding to CSV files (one per user in the account), then attaching those files to an email the account admin receives.

Other than this, our capacity is pretty consistently below 20%, but when a report runs, it’s quite often going over 100%. Since our capacity issue seems largely caused by backend workflows, in your opinion, do you think upgrading to the Professional plan on Bubble would resolve this, or is this one of those things where we probably just need to optimize our backend workflows?

I’ve noticed that backend workflows tend to be quite resource intensive, even if they only have 1 workflow action updating a single text field. We dealt with this when trying to update a user’s “last time active” every 120 seconds using a backend workflow, which I quickly realized was stupid and switched to a frontend workflow. Problem gone.

With scheduled reports, though, obviously we’re forced to make those backend workflows. I’d love to get your thoughts.

(I also PMed you, but I figured I’d reply here for others to see and benefit from. Feel free to ignore the PM if you respond here.)

1 Like

I have started reading your book and I think it is probably the most important thing a new user can read. I am incredibly grateful for this resource at the start of my Bubble journey. It is going to save me pain - some of which I had an idea I was heading for and probably much more I did not yet realise. Thank you Petter for great work.

1 Like

Hey Sam, thanks for buying the book and I hope you find it useful!

I’m not sure I completely and correctly understand exactly what your backend workflows are doing, but I’ll try to offer my general advice and feel free to add more detail if you don’t find that you get a good enough answer :slight_smile:

First, regarding backend workflows, there’s really no reason to expect them to be any more resource demanding than on-page workflows, so if you’re experiencing them to be very taxing on your capacity it’s likely more about the work they are doing. If you have clear indication that they are, I would get in touch with Bubble support and see if there’s any underlying glitches causing it.

Regarding upgrading your plan I generally advice for that to be the absolute last resort: upgrading your plan for capacity reasons should really be because your app scales, and not because it’s struggling with a low or stable number of users.

It’s hard to say without seeing your app, but I suspect you may have some unintended looping going on in your backend. Have you checked logs and scheduled workflows to see if there’s a higher number of scheduled workflows than expected? Especially if you run recursive workflows it’s very easy to miss something minor in a condition and have a workflow run amok.

Thank you so much for those kind words @anon71899553 :smiling_face_with_three_hearts:

@petter thanks for the response. In this case, I reached out to Bubble support, and the culprit is one workflow action from a 3rd party plugin. It’s the action which creates the CSV report file. Bubble took a close look at our backend workflows regarding this spike and said I actually set them up really well, and they didn’t think anything could be much more efficient. But when the workflow hit that one action, it spiked capacity to 25% running just one time. It’s made me delay pushing hard to scale because it’s taken down our app twice now since we added the Scheduled Reports feature, and this didn’t come up during testing (mainly because I don’t have test users with hundreds or thousands of records like our live users).

Know of any efficient CSV generators that can run in backend workflows, lol? I’m using 1T - CSV Creator Plugin | Bubble. It has no problem with the on-demand (frontend) reports, where it’s doing essentially the same action, but on the backend it’s spiking capacity. I’m going to reach out to the plugin developer and see if they may be able to help.

The reports are an essential feature, though, so I’m not sure how else to deal with the immediate capacity issue other than upgrading to a plan with more capacity, at least until we can find a more efficient alternative to 1T - CSV.

2 Likes

Ah, there you go! Glad you were able to identify it, and it shows the strength of Bubble’s support team as usual.

As for CSV plugins I don’t have any good recommendations unfortunately, but hopefully the plugin author will be able to help out.

1 Like

Great insights here and in the book. Some questions though.

Let’s say I have a Project datatype and a ProjectData datatype. Then I also have a Task datatype. A task belongs to a project. And I want to be able to render the tasks of a project on the Project details page. On the Task datatype, is it better to add a reference to Project or to ProjectData or both?

I am leaning towards a reference to ProjectData.

Furthermore, am I right to not to use the Tasks (list of things) reference on the ProjectData datatype?

1 Like

Hey there,

I’ll be careful in saying “right or wrong” here, since there can be many reasons for doing the opposite of what I suggest here. That’s up to you.

But with the info I have from you as of now, I would

  1. Link the Tasks only to the ProjectData type
  2. Avoid saving tasks as a list and use Do a Search for instead (this gives you a higher degree of control in your Privacy Rules and you avoid saving too much data on the ProjectData type.
1 Like

Thanks for previous reply. Couple of questions about search datatype and link datatype

Search data type:
I understand the example of the travelapp having poidata (redtaurant/hotel) and geodata (continent/country/city) in one repeating group. But how do you follow the route to the corresponding datacontainer or app page.

Do you add a reference to the datacontainer in the Search DataType or do handle this in workflows?

Furthermore how to fetch the correct page to navigate to and how to send the correct data. For example: I click on searchresult Hotel X. How do I navigate to app/hotels/hotel-x and when I click on Restaurant Y, how to go to app/restaurants/restaurant-y?

Link data type:
About “By creating a separate Link Data Type you would be able to tag a task with any other data type,stored in a single field”

How do I approach this exactly.

Task Link (Thing)
Properties
-Task (reference)

  • User (reference)
  • Project (reference)

Each record contains Task, User and Project reference in separate fields.

Or Properties:
-Task name (text)

  • User (text)
  • Project (text)

Each record contains Task name, User name and project name separate fields.

Or Properties:
-Task name (text)
-Tags (text)

Or Properties:

  • Task (reference)
  • Tags (text)

In property Tags I would add a comma seperatated string containing project name and user name.

Looking forward to see the best practices to follow

Thanks in advance!

Search and Data data types

Yes, I add a reference to the Data Container in the Search DataType, so that I always know where to point to fetch the full records. Unless you are using a Link Data Type (which I only recommend in scenarios where it has other uses as well, so as to not over-engineer stuff) you’ll need to provide one field for each type of Data record it can potentially represent, i.e. HotelDataRecord, RestaurantDataRecord. This field being empty or not can then be used to determine what kind of record this one represents, and also send it to the correct page.

Link data type
This one depends very much on your app, so it’s hard to say what’s best practice exactly. You will need the references to the original records, so you’re right to assume these should be included. If you are displaying these records somewhere (which is usually the case) I would also include some text information to show on it, whether it’s a tag or something else. The idea is to not have to load that information from the reference record, which will slow down performance (and partly defeat the purpose of setting up the link data type in the first place).

1 Like

Hi thanks for all the additional info. Another question came up.

Currently I have the Datatype ProjectContainer and ProjectData. But now I want to link the logged in user to do the following.

  • Link one user as the project owner
  • Link multiple users as project member

A user can be a owner of multiple projects or can be a member of multiple projects.

To be able to do this I want to create a Link Datatype UserProject, with at least the properties:

  • User (reference)
  • Project (reference*)
  • Type [owner, member]

I want to apply privacy rules to all DataTypes.

On the page that renders the RG with projects, I am planning to change the data source from ProjectContainer to UserProject and add a constraint to only show projects where the User reference (either owner or memver) equals the logged in user.

*But now I am wondering, should I link this DataType to ProjectContainer or link it to ProjectData. What would be the added value of the ProjectContainer datatype, not being used in the RG? From a performance point of view, Can ProjectContainer be replaced by UserProject migrating the properties from ProjectContainer to UserProject?

@petter Read your book a few times. Very helpful so thanks for putting all that information together. I have some questions regarding Satellite Data Types.

My app is for a single user to log in and view their data. Not much searching (if any); so Satellite Data Types won’t optimize searches.

Nevertheless, out of concern of the user record getting too heavy, I’ve broken out the user’s data into separate tables (“Sub Tables”) which I’ve nested in the User table.

Bird’s eye view: there are 5 Sub Tables, each have 15 fields (nothing unstructured) and each Sub Table have a list of records (each list max 10 records (and no that number won’t increase over time) and each record max 15 fields)

These numbers lead to:
(15 fields per Sub Table + (10 records in Sub Table list * 15 fields on each record of list) )* 5 Sub Tables = (15 + (10x15))*5 = 825 fields + 15 on the user for 840 total fields

If you add the weight of all these fields together, even assuming an average of 100 bytes per field (reality is slightly above 50 bytes) the weight of all these Sub Tables is less than 1/10 of a MB (84kb).

Bottom Line: it’s really not a huge amount to download and it’s downloaded only once when the user logs in.

Q#1: So, is it even worth breaking the data into Sub Tables from a performance perspective?

Q#2: even if Sub Tables are worth the hassle, does breaking the data into nested Sub Tables actually help at all? You’ve mentioned a few times that any nested objects have to be downloaded as well, implying that Sub Tables (if nested in the user which I find critical for this db design) won’t help performance over storing the data in the user table itself.

On the other hand, there are some threads out there that insist a nested object is only downloaded if its contents are referenced on the page. (see e.g., this forum comment

Q#3: if nested objects are only downloaded when needed (if that’s not true, this question is moot) à then the lists on each Sub Table would only be downloaded if referenced on that page.

Does that mean I can transfer the Sub Table lists to the user table directly (makes workflows a lot easier) as such lists will only be downloaded when referenced on the page and if the list is referenced it will be downloaded whether on the user table or a Sub Table.

Q#4: Assuming performance has been resolved, can these Sub Tables help avoid race conditions and deadlocks?

There’s basically no info out there about how to deal with race conditions in bubble; although it’s a major problem, in my app at least.

I’ve built workers and queues to handle it (not sure why there aren’t any plugins that handle this or why bubble doesn’t provide the framework to easily avoid race conditions by allowing for unscheduled API Workflows (or even future ones) that can be run as an action from a different workflow (currently they can only be cancelled)).

With all this I’m still encountering race conditions and deadlock. Would Sub Tables help me avoid these issues? Are nested objects also locked if the housing/parent object is locked? More importantly, does a modification of a nested object impact or lock the housing object?

Hey Jacob,

It’s hard to predict exactly how that will behave, but my hunch is that it’s probably not worth breaking it up – I think the engineering and maintenance effort would not be worth it. Keep in mind also that loading sub tables adds to the loading time. Loading one single record is a lot faster, so breaking it up only makes sense if you’re only one table at a time.

Q#2: even if Sub Tables are worth the hassle, does breaking the data into nested Sub Tables actually help at all? You’ve mentioned a few times that any nested objects have to be downloaded as well, implying that Sub Tables (if nested in the user which I find critical for this db design) won’t help performance over storing the data in the user table itself.
On the other hand, there are some threads out there that insist a nested object is only downloaded if its contents are referenced on the page.

Only the Unique ID will be downloaded until the data is referenced somewhere on the page. So the actual lookup of that ID and the downloading of data doesn’t happen until that point.

Q#3: if nested objects are only downloaded when needed (if that’s not true, this question is moot) à then the lists on each Sub Table would only be downloaded if referenced on that page.
Does that mean I can transfer the Sub Table lists to the user table directly (makes workflows a lot easier) as such lists will only be downloaded when referenced on the page and if the list is referenced it will be downloaded whether on the user table or a Sub Table.

Yes, you’ll be downloading a list of Unique IDs only, so if you know this number will be negligible you can store it there with minimum effect on performance.

Q#4 : Assuming performance has been resolved, can these Sub Tables help avoid race conditions and deadlocks ?

You’ll have to provide some more detail here. In what kind of scenarios are you encountering issues?

Hi @petter,

Regarding your answer on Q2 and Q3 where the results you fetch are unique Id’s how does this affect the privacy rules settings?

I have read the a look up by unique Id is not protected by privacy rules, am I right?

So when a field is for example returns a list of companies that also contains the current user company, can other companies data being accessed by looking for other company id’s in the browsers network inspector, even when a privacy rule for Currect user’s own company is set on datatype Company?

Well, this one is one of the trickier parts of Privacy Rules.

First, to protect the ID’s in the first place, that field (the list field) needs to be protected by a Privacy Rules (view field).

I have read the a look up by unique Id is not protected by privacy rules, am I right?

Kind of, but it’s a bit more complicated. You’re right that searching for a record by its ID will produce a result even if that record is protected from being found in searches. You could say that the rule stops the record from being discovered, but not referenced directly. This does not mean that the record is entirely exposed though: as long as the fields are protected, they will remain confidential. So a hacker would potentially be able to confirm that there’s a record there, but not see its content.

So when a field is for example returns a list of companies that also contains the current user company, can other companies data being accessed by looking for other company id’s in the browsers network inspector, even when a privacy rule for Currect user’s own company is set on datatype Company?

No, again they would be able to see that there is a company there, but not what it contains. Also, this would have to mean that the hacker was able to place an API call to Bubble’s database in the first place, which, while it may not be impossible, is probably not easy.

Theres is a potential risk when it comes to writing data though: let’s say you are looking at an edit form for a Company that’s being loaded through the use of a URL parameter containing that company’s UID. If a User were to replace the UID in the URL with one that they found in the list, that company would load in the form. The data would not be invisible if those fields are protected, and autobind will not work if they too are protected. A workflow however would be able to write to that record, unless it’s protected by a server-side condition on the workflow or action (server-side meaning something that can be checked in the database such as Current User’s Company = This Company).

Hope that sheds some light on it.

(What’s going on under the hood is that Bubble switches from a search to a lookup when the UID is used, which can be considered a reference directly to a record: which is technically not a search and thus not protected by the search privacy rule. This is usually not a problem, but there are corner cases where it could make up a threat, as illustrated above.)

1 Like

@petter it clarifies a lot. Thanks. I will keep an eye on the workflow conditions to also check for current user’s company.

1 Like