I’m looking into implementing Algolia for our app and am curious we should think about the architecture?
I’m particularly interested in understanding, what would be a good way to keep Algolia’s dataset in sync with our database? We have a fairly big app and having to run a workflow to pass data to Algolia each and every time we change any data in our Bubble app would likely be prohibitively expensive - both up-front and as a maintenance cost.
Are there are other potential solutions? How might you implement it?
My use of Algolia is limited. But from the few times I’ve used it, I’ve been happy with the process. Happy to connect you with one of the reps I’ve worked with.
As a potential limiter, is it necessary to have the Algolia search updated in real time? If not, you could you do a batch update every n minutes. Or, certain fields trigger an automatic update and others low priority fields go into a batch.
One way to minimize potential cost is to ensure the scope of what triggers an indexing operation with Algolia is limited to what is essential. (Ie. if there is a “Save” button that updates all values, but only a minor field was updated, should it still trigger an Algolia indexing op).
Thanks Dan. This is helpful.
If we were to send updates in a batch, what might that look like in Bubble? Is it simply:
- have a cron hit our site every, say, hour and look to see if there are any new rows in key tables (e.g., perhaps: users, events, communities) as well as any rows (in the same tables) that have been updated in the last hour.
- Then send a list of the new records to Algolia along with a list of the update records, all through the API.
Is that pretty much it? Thanks again.
That sounds roughly in line with what I would suggest. Though I may simplify it a step further and add two fields to each of the relevant tables:
- is_algolia_new (yes/no value, defaults to yes when a new record is created)
- is_algolia_processed (yes/no value, switched to no when a consequential update is made to an existing record, switched to yes as part of the workflow that updates the records in Algolia).
The workflow could be segmented into two parts/lists: one for POST, one for PATCH.
Thanks Dan. Great insights.
Any thoughts on simplifying it down to a single is_algolia_processed text field?
if empty - POST
if not empty && ‘modified’ - PATCH
if not empty && ‘processed’ - skip
We’re probably go one of two routes:
- If we batch process, then 1 field for the Algolia-ID and another field for isSynced (yes/no).
- If posting exclusively in (near) real-time, then we’d simply include a field for the Algolia-ID of that row.
One issue with keeping each record up to date with a field, is modifying every record to keep track, so the record’s last modified date loses its meaning.
Instead you could have a control table, with fields:
table, last synch date
Then new records are those with created date > synch date, updated are those with modified date > synch date.
EDIT - the control table doesn’t need to live in Bubble…
Caveat: manually updated records using the Bubble App Editor doesn’t alter the modified date.
For deletions, I’d suggest on each delete, add to a deletions table, keeping track of IDs and deletion date.
@supernaturally - that seems viable. As a matter of preference, I prefer not using logic based on if the field is empty in favor of boolean values with a default value. But I think going with your approach would be fine.
@mishav - point well taken about the modified date.
For deletions, would you mind sharing a bit more of your logic around keeping a table for deletions? Would you suggest that over having a user-deleted indicator field (and building constraints around that)?
Yes this is certainly a valid approach to data management, depending on your data retention levels and data structure for particular tables.
I wouldn’t put this in place solely for the purpose of synching to other systems.
From the point of synching, this activity is considered an update, not a delete.
Where you do delete records, they won’t be detected by looking for changes since last synch date, or by looking for a field “have I been synched”, because the record is no longer there.
Doing a full search and comparing to what you have locally is one way to find what is deleted, it can work for small data sets.
For large amounts of data, it makes more sense to keep track of what is being deleted, just the identifiers need to be kept and a deletion time. Then it can be kept in synch in a similar way to updates, and purged when synched.
In practice though you’d need to cover the inter-table relationships …
Delete a student, then a class enrolment table which had that student gets updated also. Perhaps the way Bubble removes the student in the enrolment’s list is good enough, but does the enrolment record get its modified date updated too?