Database: very wide table performance? / Custom Type w/ a lot of fields

jon2 · August 13, 2021, 12:34pm

Is anyone aware of the performance implications of having a very wide table (aka a custom database type with a lot of fields?) in bubble on the main cluster? Looking 200 or so columns.

From a technical standpoint, The only thing I’m aware of is that the underlying db is Postgres. Other than that, I’ve got no idea about how things like db caching work. A lot of preconceptions I’m holding about very wide SQL tables I think are now are somewhat outdated. In fact, Im wondering if splitting up into multiple separate types up might actually slow things down based on how things are queried in bubble.

Would a wide table have implications for a simple lookup via a UID (are the hiss of things cached?) or would we see performance issues only with more complex queries and a lot of rows.

Appreciate any thoughts people have on the matter

NigelG · August 13, 2021, 1:53pm

The thing to bear in mind is that Bubble loads everything the user has permission for even if you want a 1 character field.

In SQL terms … everything is a SELECT * FROM into the Browser.

Performance on complex queries is variable. It can be very quick at times, even around half a million rows. We suspect that Indexes are built that help here.

Other times it can be really slow. The issue with “wide” data will be ending up with Bubble’s version of a “Tablespace Scan” on a load of rows in the Browser. The more you can do on the Server the better (in Bubble terms that is avoiding too many filters/Advanced filters etc).

The real key to SQL style > Bubble (Object Relational) thinking is forgetting everything you know about Foreign Keys. You do not want to be trying to JOIN tables on anything. Use Bubble’s field linking to embed a row or a list of rows in another row.

(but don’t make your lists too big).

Bubble can be remarkably quick without needing to understand complex database stuff.

NigelG · August 13, 2021, 2:18pm

Just about to have a meeting with Bubble tech about exactly this. I can’t get it to work.

Bubble has artificial limits (no matter how much you spend on plans and boosts) it seems. 2 “workflows” per second might seem quick until you have half a million rows to get through. And then have to do that multiple times for multiple tables. I have calculated around two weeks run time to migrate ! Even if I can solve the data inconsistencies.

It is almost impossible to parallel run unless you are 100% certain you won’t get contention on certain rows (which, when you have relational data, is unlikely).

Hitting maximum CPU doesn’t just “rev limit” - it actually causes a large number of random occurrences like not being able to read all the rows in a table. It doesn’t error, it just gives you entirely the wrong result.

Ending up with malformed data then gives you the problem that Bubble does not delete thing very well at all. We have 100s of 1000s of rows that have been given a “for deletion” marker simply because we don’t have the horsepower or ability to actually delete them.

Parabola should be the way to go there (although the 1 hour time limit is very limiting). But it is very sensitive to input data (totally screwed up by some JSON in MySQL fields in my case). Then you can create the Bubble relations as you go. But then you hit the issue of knowing how to start and restart after the allotted hour.

The frustrating part is that the CSV import is quick and solid. Very impressed. But you then can’t actually do much with that data when it is in, if it is not in “bubble friendly” form.

jon2 · August 13, 2021, 2:35pm

I think screwed up in my reply to you, another thread got pulled in here. Nevertheless, your insights on the other topic are great!

Re: this thread

Is this only when calling up lots of rows and displaying them on the page? Am I right to assume that for the sake of design simplicity, bubble just queries and ships the entire thing to the page and lets the page parse it out? In other words, the back end doesn’t need to know anything about the code on the page.

Is this also true for back end searches, etc? How about if I need to ask a question of a type, but I don’t need to display the type data on the page? For example: I have a need to query this very wide table and show how many things exist of that type exist (a count), but I have no reason to display the type data. Would you happen to know if a query like that would be performing a full table scan to do that if it were part of a dynamic expression defined on the page?

jon2 · August 16, 2021, 4:02pm

Am I correct to infer that Filters/Advanced filters are performed client-side? These operations are also available in server-side workflows.

yikes. I’m guessing the bubble REST data API doesn’t help you either?

Wondering if you / anyone might be able to help me with the last post as well

jon2:

NigelG:

The issue with “wide” data will be ending up with Bubble’s version of a “Tablespace Scan” on a load of rows in the Browser. .

Is this only when calling up lots of rows and displaying them on the page? Am I right to assume that for the sake of design simplicity, bubble just queries and ships the entire thing to the page and lets the page parse it out? In other words, the back end doesn’t need to know anything about the code on the page.

Is this also true for back end searches, etc? How about if I need to ask a question of a type, but I don’t need to display the type data on the page? For example: I have a need to query this very wide table and show how many things exist of that type exist (a count), but I have no reason to display the type data. Would you happen to know if a query like that would be performing a full table scan to do that if it were part of a dynamic expression defined on the page?

NigelG · August 17, 2021, 8:35am

Yes, so the backend will look at the Privacy settings and send anything that the front end is allowed to see. Thus you could use privacy settings to limit data.

Legend has it that “linked” data is also sent (e.g. the fields in a list of a different type) - but quite how deep and when … is a mystery.

As long as the query uses a Condition then you are are fine. Conditions are done on the Server, and are usually pretty quick.

Once you get into Filters (and advanced filters are usually the slowest) then you are in the browser. So try to limit as much as possible with Conditions on the Server.

NigelG · August 17, 2021, 8:40am

Yes, it really does depend on the query. Legend has it that Bubble builds indexes dynamically (it also has unique id indexes as well, which is the only time Bubble is sure that something is unique, when it is its own id!) so some can run faster than others.

I have a question outstanding with the Performance project about why some people have terrible trouble with queries, and others report them as ok. It can’t just be some people are “better” at writing them.

Advanced filters - where you are taking each input and then running a search on that … are the least favourable. Sometimes it is unavoidable, but in this case I would suggest denormalising your data. In Bubble “third normal form” is not usually your friend

NigelG · August 17, 2021, 8:41am

The API is also rate limited. So it is a great way to do a lot of things in a long(ish) period of time.

I believe the slow delete is being looked at. So fingers crossed.

NigelG · August 17, 2021, 8:43am

This is a really good resource.

jon2 · August 18, 2021, 12:36am

Amazing. How have I not run across this!? purchased.

jon2 · August 18, 2021, 6:55pm

3NF isn’t a risk/concern in my app so I’ve already taken this advice

I just spent the last few hours reading through this book. Wow, what a fantastic resource! I wish I had this when starting out with bubble I’ll definitely be referring to it in the future.

In the book, @petter uses the term heavy data types to describe what I think I’m asking about here in this thread. i.e. Custom types with both lots of stored content as well as many different fields types full of unstructured data.

The proposed solution for faster searching through these heavy datatypes is to create light datatypes (aka satellite types) containing only the fields you wish to search and to associate the light satellite typed things to the corresponding heavy things.

While I understand the mechanics of what the premise is proposing, I’m really not exactly sure why? Is it to force whatever magical algorithm bubble uses to index these certain ‘searchable’ fields/columns inside these tightly-scoped satellite light types? Or is it to avoid a some kind of boilerplate ‘full table scan’ that bubble is doing, which is laborious for the db to process and ship and for the browser to download as @NigelG (and the book) said?

Maybe I’m overcomplicating (or over simplifying) this but I don’t understand why searching for a single or many rows in a wide table would be an expensive operation in bubble and why the satellite type would even be necessary. We aren’t talking about a computed value in these satellite light datatypes, just a clone of the same value that exists elsewhere in the heavy* type–just without sibling data.

If I have 155 columns in my type and need to display 5 of those columns in an RG times 200 things, what it sounds like bubble is doing doing is fetching all 155 columns x200 things and throwing away 150 of those columns for each row (30,000 pieces of data being fetched from the db and downloaded into the browser, but the. not shown?)

The book also seems to suggest that this doesn’t occur (as in, only the 5 columns are shipped to the browser, which appears to be in direct contradiction to the above, so I guess I’m saying which one is it?

NigelG · August 21, 2021, 7:13am

I that @petter and I are saying the same thing here.

What he says is that if you have a LIST of 2000 things on a record and you only want 200 of those, so you filter the list … you will get the whole 2000 and the browser will have to remove 1800.

Whereas if you do a search then you will only get 200 as the server will have done the condition for you.

In the book he is talking about searching and filtering on results which is likely to happen in the browser, and also you might want to text search across a number of fields (but not EVERY field).

This is a particular use case for manipulating what is in effect a lightweight index and then going to the detail record for the detail.

Things may well have changed, but you can “prove” this by having a simple page with a list / detail view.

Populate your list with a subset of the data. Then have a button that “sends” one of the rows to the group to display the detail. If you check in the Chrome (OBAA) Developer tab you should see that it doesn’t download the remaining data from the server. You can do the same trick with “linked” data as well. So have a detail field with a list of other rows and display them.

Now, maybe Bubble is super clever here and knows that somehow you want to display all the fields in the Group at a later date. But it seemed to be that Bubble would download everything it needed ONCE and not keep going back to the server to get extra columns.

Topic		Replies	Views
Optimal Database Design - Advice? Database	23	1977	April 3, 2024
The Ultimate Guide to Bubble Performance - the new edition is out (now 210 pages!) Tips	112	14356	April 27, 2023
The nono's of a Bubble datamodel Questions	22	4131	December 17, 2018
If i have 10,000 data, bubble is too slow, I noticed a big issue Database	22	4013	August 8, 2024
Speed is an issue Responsiveness	64	11159	December 12, 2017

Database: very wide table performance? / Custom Type w/ a lot of fields

Related topics