Comparing lists with complex logic

completelygolden · December 8, 2021, 5:23am

Hello, I’m working on what is fundamentally a searching app.

The user picks a list of properties:
e.g.:
A,B,C,D.

I want to bring back a list of results, where each result doesn’t have any property not reflected in their choices.
The results don’t need all of their selection, but it can’t have anything that isn’t in their selection.

e.g.
(Returned)
Stereo: A, B, C, D
Walkman: A, B, D
Discman: B, D

(Not returned)
MP3: A, B, C, E
Record Player: E, F

The one thing to bear in mind is that there are a LOT of those records. 380k.
And each has 1-80 properties. Total unique properties across all records is something like 75,000.

I feel like I’ve tried almost everything in bubble, but I can’t crack it.
Also, currently I have the data in SQL, in 3 tables:
Records, RecordProperties, Properties.
I can reformat however I need to for bubble.
In SQL, I have it working as follows, assuming I’ve already converted a,b,c,d to 1,2,3,4 prior:

SELECT * FROM Records WHERE recordid NOT IN(
SELECT recordid FROM RecordProperties WHERE propertyid NOT IN (1,2,3,4)
)

joeyg · December 9, 2021, 6:56pm

Can’t you just use the SQL connector and filter using your SQL statement from above?

ed727 · December 9, 2021, 8:00pm

Interesting… there may be a way to do it in Bubble with an advanced filter, but that won’t work for you because advanced filters are client side and aren’t performant on large data sets.

If you were happy with an item containing any one of the properties the user selects, then this could easily be done server side by searching the joining table for entries that are contained in the list of properties the user selects. It’s the wrinkle that an item can’t have something that hasn’t been selected.

Because I’m also experimenting with variants of many-to-many searches and joining tables I’m going to try to explore this when I have some time, but I suspect I’ll come to the same dead end.

rpetribu · December 9, 2021, 8:14pm

That is what I was thinking…

completelygolden · December 9, 2021, 8:37pm

rpetribu

Yes, that’s what I’m doing now with an Azure instance. But it’s only ~1.3gb of data so I figured if I could get it done without leaving bubble I might see better performance and save some money.

Additionally, at the cheap levels of Azure/RDS performance still leaves a bit to be desired.

completelygolden · December 9, 2021, 8:38pm

Thanks, that’s the same problem I found I could solve and the same limitation I haven’t been able to work my way past.

It’s a kind of niche need, especially at this scale. But I was hoping if get lucky and someone would have cracked it natively already.

ed727 · December 9, 2021, 9:43pm

An idea hit me that would be server side, but not sure how performant it will be on a huge dataset since it involves a nested search.

In your search, when a user selects Properties, also have a custom state list that contains the properties that weren’t selected.

Search the Item table, and add a constraint that is a nested search for Items in the Items_Property table. In the nested search, it has 2 constraints… the Property is in the selected properties, and the Property is not in the CS list of unselected properties.

The trick with this type of nested search is you have to search based on a field’s value, and that value has to be unique. One would think you could do UniqueID, but Bubble restricts searches based on that. So you’d need to create an ID field for Items in the Items table and in the Items_Property table, so you can search based on that.

I haven’t tested this so I may have missed something…

joeyg · December 9, 2021, 11:14pm

Bubble is built on AWS, so whenever you do a “Do a search for”, you are essentially contacting a database on AWS. So I wouldn’t view it as “not leaving Bubble”. In my experience querying a large MySQL RDS database using the database connector, I’ve seen way better performance using straight SQL like you have above. For comparison, the Bubble workflow step that kicks off a select statement gets the results in about the same time as MySQL workbench. You do have to get around the 200 row return limit, but it sounds like you can live with that by paging your search results. Not sure about Azure costs, but you can run a t-series RDS instance on AWS in the free tier for a year and it won’t cost you anything.

The Bubble filtering mechanisms are a nice “no-code” alternative for simple stuff and it’s great for folks who don’t want to learn a whole new technology and syntax, but I don’t think they are really designed as a replacement for SQL and large datasets. SQL is tried and true technology, it’s robust and performant. If you have SQL chops and are building a complex app, I would look to use it for all your large queries and filtering. Then use Bubble to display the results and navigate from their. It’s been a great combo for me.

completelygolden · January 15, 2022, 3:02am

Long term follow up.
I never got the performance or functionality out of using the Bubble DB.
I picked an Azure DB just because of greater familiarity spinning up servers and a nice cheap $5mo server option.
Launched the app a few weeks ago and it’s been getting used.

Thanks for your help and input everyone!