Using hashes is the best method to detect changes - especially in the backend

I shared a pretty crappy (in hindsight) method of data change detection some time ago.

A few days after I decided to use hashes instead. I’ve tried many ways to efficiently detect multiple fields for changes in one data trigger and hashes is the way to go. It’s nothing new in DB management but will be quite new to most Bubblers.

How to:

It’s actually pretty easy to pull off because Bubble already has hashing as a native operator.

  • Just create a new field to store the hash.
  • Whenever data gets changed, just create the hash and store it.
  • In the data trigger just compare the hashes of before and now

What should be in a hash?

Whatever values that you want to detect changes in:
Just create a combined string of field values

For example: field1-field2-field3-UserID

IMPORTANT: Hashes are affected by the sequence of characters so
field1-field2-field3-UserID is not the same as field3-field2-field1-UserID even if the field values are the same.

Other Benefits

  1. Hashes will always be a fixed sized regardless of the original string (eg. MD5 always produces a 16 byte string even if the original string is 1MB)
  2. You can create different hashes to detect different groups of changes
  3. Employ Merkle Trees: Hash a group of hashes from rows of data to detect any changes within large rows of data. (eg. store group hashes as a separate data type and run periodical checks; great for ensuring syncing with an external database)
  4. Heck go crazy and hash the hashed hash groups for very large checks.

Conclusion

The reason I like this method is because it doesn’t use any plugins. Hashing is just another Bubble operator.

5 Likes

Can you share screen shot of full set up? Missing what comes before the formatted as hash operator, so a bit confused on how to set up as the examples of field 1-field2…I think it’s in the arbitrary text

thats a pretty good tip. @boston85719 this just creates a plain text of the item fields you want to track, and you can store the MD5 hash in any way you want, then compare it in the future with the same plain text of the same fields.

This will be the data as a list, seperate by comma.

How to make a hash different like the below?

What @ihsanzainal84 means here is to be consistent in how the values are concatenated before the hash is calculated.

Taking the liberty of rephrasing the question, how do we ensure we don’t get the same hash value for different field values?

We need to consider the allowable data values. For example, if the field delimiter “-” happens to appear in data to be hashed:

Old values … First Name: - Last Name: -- string to be hashed: -----user01

New values … First Name: -- Last Name: - string to be hashed: -----user01

If this (slightly ridiculous) data needs to catered for, we can put in field names to ensure a different hash.

Old values … First Name: - Last Name: -- string to be hashed: first---last----user01

New values … First Name: -- Last Name: - string to be hashed: first----last---user01

Edit - for small amounts of data, it may perform better to compare the concatenated string and not bother with MD5 hash.

Thanks for sharing @ihsanzainal84

1 Like

It’s any set of string values that you want to check for differences. How you want to concatenate it is up to you. I just used the - example for simplicity’s sake. You can hash a JSON object, or a JSON array. Just ensure that your JSON shape is always the same, including indentation (see below)

A simple use case would be a value from a field that stores a caption for a social media posting and you want to ensure that updates to captions will trigger syncs to what has already been posted.

Hashes are designed to be extremely strict so even an extra character will bring out result in a different hash. Applepie and Apple pie are different hashes. You can play around with hashes with a simple setup: Simple MD5 hash test

In the case of checking if a row of data has changed, the sequence and shape of the value is the most important. So far, I haven’t had a need to add unique values for “row data change” hashing because of the strictness.

If you are comparing different rows of data against each other, you can “salt” by using the row’s UID as a unique identifier. It’s the most trustworthy native Bubble value when uniqueness is a requirement.

Use values that fit your workflow

Concatenating is good enough and as @mishav hinted, hashing is worth the trouble when you expect the comparison string to be much longer than a resulting hash. Btw, you use JSON a lot and JSON are a great shape to hash.

Important to note that if you are using Bubble’s text editor to create RAW JSON, every indentation counts as a character so you can end up with extra characters that can mess with your hash correctness.

In this case, you can go an extra mile and use Run Javascript to run a simple hashing function that builds your JSON first. Though the extra trouble is only worth it if you already, or plan to, work with JSON.

Another use case: Auditing

In my case one, of the systems I am designing requires me to prove that the audit logs are trust worthy.

So, I store the hashes of previous versions by appending them so it will look like Orginal > Hash1 > Hash2 > Current the audit workflow will check integrity by hashing the values of the previous versions and checking against that sequence. If the sequence does not match, then it shows that a past version was tampered. You wouldn’t want to store a long unhashed sequence in a field.

It’s nothing new in code but it’s not something I’ve read about in the Bubble forum yet. It works very well for WU management since data size is a big factor.

2 Likes