Understanding how searches and filters work in Bubble

Understanding exactly how and where searches operate in Bubble is important for understanding your application’s performance. For example, if my data is already on the page, do expressions that use that data refetch it and cost WU? What about :filtered?

This is based on my own research and understanding (it doesn’t come from Bubble team so don’t treat it as gospel). There’s always lots of asterisks with how Bubble works, a ton of edge cases, and lots of ‘well yes, but also…’.

Searches in Bubble

How Bubble schedules searches

Let’s suppose our page needs to show Do a search for Users in a Repeating Group. What actually happens here?

Once our page is rendered and we can see some elements (not before - page load must occur before data is fetched in most cases), the Bubble engine knows what expressions it needs to evaluate in order to do things like show or hide more elements. Some of them which relate to client-side data (e.g page width, other element visibility) will happen virtually instantly.

Others require data from the database (think, show this when Do a search for X is not empty).

When the engine knows it needs database data, it adds it to what you might call a queue. Basically, a queue of requests like ‘fetch me all Users where XYZ’.

After 10 milliseconds, or the queue has 30 pending searches, or the queue has 200 pending items across all searches (whichever comes first), the searches are batched together and run. Where possible, these are all combined into one request. You can see this in your network tab as msearch (likely meaning multiple searches).

This will return up to 20 items per search in the response.

How search results are batched + paginated

But, what about if we need 500 items, because, for example, we are displaying all rows of the repeating group immediately without lazy loading?

In this case, Bubble takes the (up to) 20 items that were returned in the first batch, and calculates the average size of each item. Bubble targets a response size of 1MB. So, if 20 items was 200KB, the next time, to fetch the rest of the rows, it’ll fetch 100 items as it expects that to be about 1MB. The limit here is 400 - if your Things are very small, it won’t just fetch them all at once - it will fetch a max of 400 in a batch.

The learning point here is that small Things (less fields/data on each field) will generally return to front-end faster, because more can be returned in a single batch, and each batch runs in sequence.

Fetching things by ID

Not all database interactions are searches. Sometimes, you’ll be fetching things by ID. You’ll never do this explicitly (though I wish Get thing by ID existed!). However, any expression like Current cell’s Invoice’s Company is fetching a Thing (Company) by its ID. This is because the Invoice stores the Company ID on its Company field. So, we have the ID, we just need Bubble to retrieve it from the database.

This is a bit more straight forward - from my understanding, Bubble just batches into fetching 200 Things at a time.

Dealing with data already on the page

This is kind of best illustrated with a diagram. Bubble is actually pretty smart (!). It tends to not fetch more data than it needs, and makes use of data it has already fetched on the page where possible.

Let’s take the case where we have a Repeating Group of Users, and we’re using :filtered on it (people sometimes do this and hope it doesn’t cost WU because it’s ‘filtering data that’s already on the client’.)

You might wanna just skip to the diagram, but it roughly goes like this:

First, Bubble checks whether all your filter constraints can be evaluated in memory. Basic comparisons like equals, not equal, contains, greater than, less than, is empty, etc. can all run client side. The big ones that can never run client side are:

  • geographic filters (within X miles of Y)
  • text contains, contains keywords and the negations of these
  • Any field
  • Range contains/range contains point
  • Email field equals and email field contains

If any constraint can’t run client-side, that constraint must be sent to the server as part of the database query.

Assuming all constraints can run in memory:

  • Are there under 100 items in total and we have them on the page? If yes, no need to fetch more, we can filter locally.
  • Else, Bubble checks its cache of previously-fetched search results that we have downloaded already. It considers whether finishing loading would require more than 3 batches of 400 items (1200 total). If we only need to load fewer than 200 more items, or we’ve already loaded more than half and need fewer than 1200 more, it will fetch the rest and filter client side.
  • If completing the load would require fetching significantly more data than you actually need (more than 1.5× what you’re asking for), Bubble avoids the wasteful fetch and filters on server instead.

Now, for geographic address list fields, date range list fields, and number range list fields, these will always operate client side (by client side, I technically mean not in the database - if it’s in a workflow, then generally it’ll be on the Bubble server. But the part you need to care about is whether the data is returned from DB as that’s what dictates your WU cost and largely the performance too).

Hope you find it interesting :slight_smile:

Hope you find it interesting :slight_smile:

29 Likes

im checking here everyday for this type of content, thanks @georgecollier. i have question about :filtered.

  • lets say i have a tasks db and i want to show them in status columns on frontend.
  • i tucked away a rg(element name: root-rg-task) in the page with a do a search for all tasks of current user, without status filtering.
  • then i put another rg(element name: rg-status) with data source get options:all task statuses, and nested another rg(element name: rg-task) in with data source root-list-task’s list of tasks:filtered[status: current cells status]

you stated that :filtered might not cost less wu’s so i tested the setup above wu wise and it almost decreased by 70% compared to do a search for on each status cell. how do you think this dynamic works?

It goes back to the follow chart and depends on number of results. But there’s lots of edge cases and I’m simplifying a lot so your mileage may vary!

Nice share. I haven’t had the time to experiment on something and want to check if you’ve tested this:

When passing loaded data as parameters to reusable elements, are they stored in memory as seperate memory blocks or do those parameters become pointers to the source?

If it’s the former than passing around large values to reusable elements will eventually kill user experience as memory usage bloats.

This is ‘of the same data type’ and not across different data types correct? mSearch is not an aggregate search for all data requested across the page for all data types on the page, it is an aggregate search of just a single data type…so 30 pending searches is 30 individual data requests for data of the same type, and 200 pending items across all searches is all searches of the same data type?

If so, does that mean, 200 pending items across all searches of the same data type implies something like search A with filters 1,2 and 3 and search B with filters 2,4 and 5 would become aggregated on the network tab into a single mSearch and not be shown on network tab as two separate mSearch since they have different filters (ie: payload requets)?

This is not 100% of the story I believe. In your research did you test against Repeating Group Height, specifically targeting cell height and therefore relational to the number of items in the Repeating Group that are necessary to populate the visual element on first draw and subsequent ‘lazy loads’ as the user scrolls the repeating group.

Typically, what I’ve seen when testing is that there is a direct relationship to the height of the repeating group itself, and the cell height, implying the total number of items needed to fill the first page (first load) of the repeating group and each subsequent fetch for the lazy loading as user scrolls the repeating group.

A maximum of 400 items in one batch?

It should be easy to implement for bubble. Throw an idea onto the ideaboard for that. They already do it for ‘current page thing’ and ‘get data from url’.

I do not believe this is 100% accurate. I’ve seen just this week an app that was performing 100s of mGets (fetching an individual item based on ID) via a repeating group that had a text field that was referencing the current cells things related data items name…what this seemed to cause was an mGet for each related thing for each item in the repeating group. Of course, the fact that they didn’t batch them into 200 things at once into a single mSearch could be tied to the lazy loading of the RG.

Does this imply Bubble will not use the existing data on page to evaluate these and will perform the search? And in this situation, the data source using those filters is not referencing a list of items expected to be on the page already (like a plugin that holds list of things and exposes a state of that list) and is instead a data source of ‘do a search for’ with those filters?

Very interesting. I’d be curious to see how you setup your experiment to obtain these findings.

Yes, interesting. What would make it great are videos showing the setup of each different test you ran, showing the actual network tabs with description of what a user is seeing from those network tab readings and how to actually interpret those for themselves.

It is multiple searches and does not all need to be the same data type. 20 results are returned for each data type.

I’m not certain. Intuitively I think they’re stored as their own data rather than just a pointer to another group, but not sure sorry.

Sorry, to clarify I mean up to 200 things at a time (waiting 10 ms to attempt to fill a batch, then sending off the request etc like searches).

My original statement holds here. Geographic filters, contains keywords, range contains, etc. are all sent to the server. Bubble will NOT use existing page data to evaluate them. The database handles these.

However, if you’re using :filtered on an existing list (like a custom state or plugin state), these constraints CAN generally be evaluated client-side. Bubble has client-side implementations for all of these - (e.g geographic search calculates the haversine distance locally, and text contains does word splitting and matching in the browser).

But the decision for whether that filter runs in client or on server is based on these:

1 Like

Thank you George

2 Likes

Thanks, George.

You always have good advice and insights.

They should put some of your guides in the documentation :slightly_smiling_face:

1 Like

Is this then tied into the bulk_watch? The search_ids representing some internal check for all data types returned in the single mSearch?

I’ve experienced slow downs when passing properties as data sets through nested reusables. Because of that and the fact that reusable properties if set as an option set do not allow for the ‘this option’ filtering capability, I am less and less using reusable properties.

1 Like

Bubble can subscribe to updates for a Thing by ID, or a search. That’s used for real-time