How to find and delete duplicates in a list of database things in backend workflow?

Hello guys,

I know this question goes a little bit into many bubblers uncomfy zone - however I wanted to ask it anyways :slight_smile:

My application aggregates keywords from various sources which I am saving to the database in a backend workflow. Since those sources might contain overlapping information it happens that maybe 5%-10% of the data is duplicates.

At a later point of the flow I would like to identify and delete those duplicates. Can anybody here suggest a smart way of doing so?

  • What I know I could do is check each time I save a keyword if it exists or not. Yes that could potentially work but a) its wu expensive and b) does not work well with the current setup (where I am counting the keywords and based of the count take further actions. So if I now stop to save some of the expected and already counted keywords I have trouble elsewhere. Therefore I’d like to find another solution which might also come with a high wu usage - but that is okay for now).

  • something I theoretically thought of: duplicate the list of keywords I already have, then compare the original list with the new list and take further actions… But my feelings say there is probably a way smarter approach.

Thanks for any hints!

After thinking more about this problem the quoted approach does not even work since the save keyword workflow runs multiple times in parallel while saving those keywords from different sources. Therefore the only when condition might not even be reliable checking for duplicates since the possibility is given that at the exact same time the exact same keywords is being saved.

Does anybody here ever had the same problem and solved it somehow in the backend?

Create a function (a custom event) that merges both lists and “spits” out a list of unique entries.

This may be a good use case to explore processing this list outside of Bubble in order to save WUs

1 Like

So I have found a solution with the help of @chris.williamson1996 and the following post:

For anybody who comes across this I will try to elaborate a little bit more how I did it:


Be aware that this only removes one duplicate. If you have more than one duplicate of the same thing make sure to catch it also.

This approach does not involve any issues with race-conditions - however being on a wu plan this could come at a higher price than other methods.

Another solution could be to make use of any external service APIs e.g. any model from openAI could do the job and depending on the amount of things this could be cheaper.

Thanks again for everyone involved.

Cheers

1 Like