Optimal Database Design - Advice?

Hi everyone

I’m in a bit of quandary over the best way to create my bubble database to allow for an ‘optimised’ filter based search of tens-of-thousands of users.

I have a site that stores respondents information (for market research projects). For each user/respondent, there can be up to 50 individual pieces of info about them, in the form of answers to questions such as ‘email’, ‘gender’, ‘location’, ‘medical issues’ and so on. Information held can be in a variety of formats i.e. ‘number’, ‘text’, etc, (mainly in the form of dropdown selections but some with free-text input) and each question could have anything from 2 option choices (yes, no) to 200 option choices (think location dropdown for instance). The 50 questions are made up of 7 categories: (1) User [10 Questions], (2) Personal [17 Questions]l, (3) Work [7 Questions], (4) Tech & Comms [5 Questions], (5) Consumer Behaviour [5 Questions], (6) Utilities [3 Questions], (7) Communication Preferences [2 Questions].

What I’d like to know, is if there is an optimal database design/structure to house this information, particularly considering that we have to search through tens of thousands of users through filtered searches? For example, we might need to find males that have an income of X, live in a particular location, and have a certain mobile/cell provider.

Currently I have considered 3 possible ways of doing this but cannot determine the best/optimal choice.

  1. To have just one ‘User’ datatype/table (with the additional fields added to it) containing all the 50 data points.

  2. To have two related datatypes/tables, one for ‘User’ (with basic signup info) and the rest of the info/data points held in another datatype/table (lets call it ‘Info’).

  3. To have 7 related datatypes/tables, that represent the categories mentioned above. So one for ‘User’, and then ‘Personal’, ‘Work’, ‘Tech & Comms’, ‘Consumer Behaviour’, ‘Utilities’, 'Communication Preferences’.

Ultimately, I’m trying to plan for the worse case scenario where we have to search/filter across all 50 datapoints for 100,000+ users, without the query/result taking forever to be returned (or worse, timing out).

If you want to plan for a 100,000+ entry database, and have that be within Bubble (rather than an external database), your goal will be to ensure that every search is executed server side. This means searching only using the “Search for” box, and searching based on information that is directly in the datatype’s fields (rather than nested information). Bubble can search large amounts of data very rapidly, as long as it is “server side”.

The reason is that the moment you go outside this construct (via filters, advanced filters, intersects, nested searches, etc.) Bubble kicks the search execution “client side” to the user’s browser. Client side searches work fine with smaller amounts of data, but are not practical for large databases, since in a client side search, Bubble downloads massive amounts of data to the browser and has the search executed there.

That means no #3 in your list. In terms of a #2 approach (satellite data), I’d only advise it if you have fields that you don’t need to search, and which are data heavy. Otherwise just stick with #1.

If you are planning for a large database, I’d recommend investing some serious time getting to know the client vs. server side search issues deeply. This will save you some major headaches down the road and avoid a doomsday scenario where your app is unusable when it gets large. There are some great posts on the forum, including one by Bubble’s co-founder Josh, a book on performance by @petter, and also Bubble’s manual.

1 Like

Very useful @ed727. Thanks for the heads-up on this. I’ll probably get the book to get a deeper insight. Much appreciated.

2 Likes

Hey Ed. I got the book and it is very good (I’m about 30% into it), but I had an initial question that I was hoping you might be able to shed some light upon.

If I were to consider using another backend with Bubble (such as backendless), would this make the management of such a large database and with such likely complex database queries, more performant? I ask because I imagine it effectively removes quite a heavy resource away from Bubble in addition to the fact that someone like Backendless ‘specialises’ in performant databases. Or then again, could it cause problems not using the native Bubble backend?

Additionally, I imagine in the future, that it might be likely that I would connect to the database from other resources - imagine a dedicated mobile app created in AppGyver or similar (I am aware that you can wrap a Bubble app but I haven’t looked into this much yet).

Good question. Everything I’ve read (and my own experience) is that Bubble’s database is performant and scalable, and uses the same technology as other databases. However its two limitations are: a) it moves some complex searches to the frontend, which as mentioned aren’t scalable; and b) it’s not that fast at making changes to large amounts of data quickly (like if you had to change data in tens of thousands of records on a frequent basis; this is currently done via a recursive workflow that takes time to execute).

But if you don’t fall into one of these categories, Bubble’s database should be fine dealing with large numbers of records. And if you can stay within Bubble, that avoids the hassle (and time) involved with selecting and learning another system, then doing the API connections and debugging (not to mention any unforeseen issues or downsides you run into using an external database).

I’d recommend first learning Bubble and seeing if your search needs can be done within the “Search for” box. From what you’ve described, it sounds like they would. But then if you find they can’t then you can invest the time down the road investigating an external database and migrating.

Regarding the best way to launch a mobile app, I don’t know. Bubble’s database could technically be a backend via an API just like any other database; how it performs as a backend vs. other choices I don’t know.

2 Likes

Okay Ed. Thanks very much. Much appreicated.

Hi @stephen14

With which way did you end? And how is it going?

I have the same situation in which I have a datatype with 50+ fields and I need to search all of them and I’m a little stuck on how to do it.

Can you give me sone insight?
Fabian

Hi @ed727

May I ask you which external database offers this kind of solution?

I have 55 fields on a datatype (like a spreadsheet) and I have to keep them on one datatype if I want to filter them on the server side (with a linking table is no solution as you mention above)

And I don’t know if you have any experience with large volumes of data on one single datatype.
My question is do you know how much data (rows of records) can Bubble handle before it slows down?

Can it handle 100.000 rows (multiply by 55 fields)?

Thank you for your detailed answer above.
Fabian

Hi, I don’t have experience with integrating an external database with Bubble.

But any modern database architecture should be able to handle 100,000 records with no problem, and that includes Bubble (as long all the searching is done server side). For example, databases will do indexing in order to speed up frequently requested searches on large datasets, and Bubble will do this.

I recall seeing some people in the forum posting about having datasets this large and it not being a problem. It only becomes a problem when your searching starts getting pulled to the client side.

One note on searching on related datatypes (or “nested” searches, where you are running a search within a search). I recall when I was testing how Bubble ran nested searches, that Bubble ran the nested search first and downloaded those records to the browser. Then it took those results and used those as the relevant constraint for the main server side search. The takeaway is that if your related datatype in the nested search isn’t too large, then I don’t think a nested search would have a performance impact, because not that many related records would be downloaded to the browser. Hope that makes sense.

In general, normalizing data by having related datatypes is standard database best practice, so I would look to do that first and only deviate if you need to do a workaround for Bubble.

1 Like

Hi Fabian,
I’ve made a plugin (launching in semi-private beta next week) that enables instant search in Bubble. It works well whether you have 100 or 100,000 records. Let me know if that’s interesting to you and I can share more/invite you to beta. Best!

2 Likes

@ed727 Thank you very much for your detailed information.
I really appreciate it

One more question…
In my repeating group I have only one linked field which are images. And I do nested search to find the right image.
Do you think it will be a big problem in the future?

Thanks again
Fabian

It works with linked datatype as well??
P.S I’m interested

I’d be interested in being part of your Beta Test.

I’m not sure because I don’t know how Bubble would handle the image field. I assume you have a main datatype that then references an image datatype. And your image datatype has a field for the actual images, plus text or category fields you can search on.

My understanding of images is… when Bubble displays the image, it downloads the image to the browser. This can definitely cause slowdowns because images are data heavy. But if the image is not needed for display, then Bubble only downloads the AWS link to the image. (All this said, I recall some posts where people didn’t set it up correctly or had a buggy plugin that downloaded the images in the background, rather than the link).

So for a nested search, let’s say you are searching for Main Datatype items based on text fields in your related Image Datatype (like search for Users who have uploaded cat photos). Bubble will first do the Image Datatype search server-side, and download those results into the browser in the background, then will send those results back to the server for the Main Datatype search. If Bubble is only downloading the image links, then I wouldn’t think it would be too slow unless you had tens of thousands of image entries. But if Bubble is also downloading the images themselves, then it’s a problem.

The way to tell for sure is spend some time with Google’s Dev tools, and specifically the Network tab. That’s what I used when experimenting with different searches and database setups, as it would show exactly what Bubble was downloading to the browser as a result of each action and search, and how much time and data it involved.

1 Like

Hi, my understanding is that plugins can’t reach into what Bubble is doing server side – so any plugins that handle searching/filtering will only work client side, which involves downloading the full set of records to the browser. Is that correct?

Morning Ed!
You are mostly correct that any plugins that handle searching/filtering only work client side. But we don’t download the full set of records to the browser (like the fuzzy search plugin) as that is one of the worst ways to do search (read page freezes, high WU usage). We sync the Bubble DB with a search provider of the user’s choice (whether Algolia, Typesense, and we’re working on 2 others) and let that engine handle the search. We offer multiple providers within one plugin because 1) these services have become feature competitive (they all basically do the same thing) but also 2) because being able to switch/try others easily reduces vendor lock in (to the extent that our early users are saving 100s of dollars per year due to being able to “shop around” while minimizing plugin integration cost/time).

1 Like

Ah, understood, thank you.

1 Like

Hey Boston and @fabian.banushi , I’ve just messaged you both with details about our search plugin. Let me know if you have any questions/want to proceed with trying it and we’ll make sure to get you setup as soon as possible :slight_smile:

How do you tell which request is first… I’m looking at Dev Tools and I don’t notice anything (I even watched some YouTube about Dev Tools).

I looked search and msearch
Which come first? Or maybe I’m seeing the wrong thing?

Thanks again
Fabian

Hi, it has been a while since I did it, so I’m not sure.

I would watch the page itself load in the network tab. You’ll know you have the right file when you click into it and see actual data entries.

1 Like