Remove duplicates records: how to

There is thing that I faced recently. I have a database that contains >100,000 records and some of them is duplicates. My goal was to remove duplicates, sound like a quick task but “reality is often dissapointing”.
On the forum there is couple ways to do it:

  1. Mark duplicates and delete with workflow - make a change on the list - this thing won’t work if you have more that couple thousands records, like in my case, workflow will time out
  2. Mark duplicates and delete with recursive workflow - well I tried that but didn’t managed to make right settings to achive goal, anyway server side workflow will take around 10 hours to search and mark 100,000 records, that I believe is just waste of time and apps capacity.

I found an easy workaround how to make thing work and don’t worry thinking that you made a mistake in workflow and deleted wrong records.

This is how it works

  1. You need to export your whole database records from editor in CSV, there you should include fields:
    Unique ID and other one that you will use to check for duplicates
  2. Convert csv file to xls or google sheet format
  3. Use built-in functions in excell to find and mark duplicates
  4. By the end you shoud have file with all duplicates you want to delete
  5. Insert in row that is not unique ID some data for example - “needtodelete”
  6. Convert file to csv (better use online services, since bubble didn’t accept files that I tried to save in excel directly)
  7. Go to editor and press modify, choose type of data, choose your csv and assigned fields
  8. Modify data

Right now you have marked all your duplicates with “needtodelete” in field that you choosed.
So only one step left
9) Delete data directly from editor or with workflow with constraint - chosen field = needtodelete

Without time that export and modify will take, all this steps should not take more than 20 minutes.

12 Likes