Database performance when filtering by lists

I’m working on a project where users will submit survey responses. A response belongs to a Dataset. The user can add multiple Datasets into a Collection for viewing. See a sketch of the data structure below.

When the user views a collection, I’m planning to hold this data using a Repeating Group of responses that “contains” the collection_id in its “list of collections” list. Will filtering by items in a list drastically reduce the performance of this query?

Gonna have to give the textbook ‘It depends’ on this one.

There are two things that i feel most drastically affects performance. One is the length (and weight) of the list. Lists are downloaded in their entirety, and are filtered on the client and cannot be filtered in the bubble database. Long lists are generally a bad idea on Bubble (ie >500 elements). Filtering on small lists of 10-100 can be fast and cause no problems. The second thing is the complexity of the filtering; if you are running some simple filtering on Dataset fields which are downloaded to the client anyways it will be quite fast. If you are filtering on Collections fields, then you will just add these filters on the constraints of your ‘Do a search for’. On top of that the filtering needs to be simple, essentially boolean masking. If you need more complex filtering (using Bubble’s Advanced filtering operator), performance will be reduced. I’ve seen apps get slowed down to a halt because of advanced filtering. Recipe for destruction is when a filter applied to a list needs to query the DB for every line item of the list.

This thread could also be useful when thinking about bubble DB structuring:

And @petter 's book on bubble performance is a great read to understand bubble’s dos and donts.

I can see that you have dealt with relational databases before. With the intent to make it more intuitive, Bubble approaches relations slightly differently, and abstracts away Foreign Keys.

For example, your Repeating Group’s data source will only be “Collection’s dataset_ids”. This expression does not return the PKs of the Datasets, it returns the datasets themselves.

Give some examples of what type of filtering you need, and the estimated max length of the dataset_ids list.

1 Like

Appreciate the detailed reply Nico. Here is an example that will be common in this app.

  • There are two datasets, each with 1,000 responses
  • Both datasets belong to a collection
  • The user chooses to view the collection and expects to see every response from both datasets.

In the topic of list length, I expect most datasets will belong to 1 collection. Some will belong to less than 3 collections, with the worst case being 10 collections.

How would these constraints affect performance?

I think it’s hard to provide you with a clear cut answer; not only will the (varying) volume of data play a part, as you probably know it is likely to be impacted by the aggregations and groupings you take on the data thereafter as well.

(@nico.dicagno has offered some useful insight and resources where you could explore best practice)

IMHO, you may need to test this out with different permutations to get assurances and see how it performs. If you’re building an app that intends to scale, it would be worth spending a bit of time on it now so you don’t have to overhaul it later. One alternative may be to use a different db for this part of your app (e.g. Supabase). Store all the answers there and have it handle the combining and aggregations of the surveys - it’s lightening fast and you’re unlikely to run into any performance issues for these volumes of data.

1 Like