How To Build Nested File Structures

I’m building a file management section for an app I’m working on - ProductizrPro, a client portal for productised services like my own. I wanted to leave behind a few tips for anyone that wants to build a file management system within their own app as it’s not as easy as it sounds.
CleanShot 2023-12-06 at 18.10.47

There are a few challenges:

1. how do we display folders and files together?
2. how do we ensure files don’t get orphaned?
3. how do we display breadcrumbs / file tree?
4. when we delete folders, how do we deleted the nested folders and the nested folders in those folders etc?

Let’s address these one by one. This is very much a general idea guide rather than going over every action so take it or leave it.

  1. Have a single data type for files and folders. The data type I use for files/folders is as follows:

I’ve annotated the fields that are irrelevant do the file system but necessary for my app for privacy rules, organisation etc. Note I’m using Wasabi to store files rather than Bubble native, but the same logic applied to Bubble storage. The ‘File Type’ option set contains File and Folder. This tells us if something is a file or a folder.

  1. We ensure files don’t get orphaned by creating a File as soon as it’s uploaded (or the upload starts if you’re using an external plugin). So, for example, when the user is on the popup to upload their files, they drop the file, we create the File thing and schedule an API workflow in 6 hours to delete the thing, and when it is uploaded we set the File Status to Pending (from an option set). Pending is what I call the uploads that have been uploaded to our server, but the user hasn’t actually confirmed. Once the user clicks save/submit, I mark the File Status as Complete. Now, this scheduled workflow in 6 hours has a condition attached such that it will only run if the File Status is not Complete (i.e if there’s an error, the user doesn’t confirm the upload, we’ll just delete the file from storage and the database).

  2. Breadcrumbs → URL parameters. I have a URL parameter for folder, and URL parameter for path. The folder parameter is the unique ID of the currently selected folder, and the path parameter is a comma separated list of unique IDs. When we click into a folder, we add that folder to the path. To show the breadcrumbs, I have a repeating group of Files:sorted by created date (as child folders are always created after parent folders) with a constraint that unique ID is in Get path from page URL:split by ,. When we click on a breadcrumb to go back, we change the URL parameter so it only includes the path up to the clicked cell’s index, and adjust the folder parameter accordingly.

  3. This is the most challenging part and has huge potential for infinite WU usage. I first need to drill down and identify all of the children of the folder, and the children of children etc etc. I do this by Scheduling API workflow on a list of File’s Children. This runs itself and keeps identifying children of children of children until there are no more child files. This workflow also receives the parameter for the File that we decided to delete (the parent of all the children/grandchildren). Now we’ve identified all of the children, we can work our way back up, one folder at a time. If I’m in folder D, the workflow will delete all of the files in folder D, then schedule the same workflow on D’s parent (C), and then delete itself (D). Then, C will delete its files, then schedule the workflow to run on B, and then delete itself. When X’s Parent File is empty, we know that another recursive workflow has deleted that file, so we can stop the recursive scheduling. In addition, when X is the File that we decided to delete, we can stop scheduling future workflows, else we’ll delete X’s parents.

Tricky, yes, but hopefully it gives people some ideas :slight_smile:


Cool work. Thanks for sharing. :pray:

Some thoughts:

  1. Concerning DB, I am concerned about linking all of the children. Is there a reason why file/folder is an option? I am curious, what are the possible folder type values?
  1. Is a scheduled WF required? Why not run it on approval for all other pending files besides the one in scope?
  1. What is the source for this list of UIDs? Is it constructed as the user unwinds the folders? That means no search functionality? :confused:
  1. not ALWAYS unless you do not allow folders to be moved (or you perform a move by duplicating the folder and “transferring” the children over).
1 Like

Sorry, Folder Type is another app specific one - it’s either Invoices, Requests, Quotes, or User generated (so essentially indicates whether it’s a system default folder). File Type is either File or Folder (obviously only Folders can have children, be clicked on, etc)

More WU efficient and speedier than Do a search for children with specific Parent based on my app’s expected use case

Don’t think I was very clear. File dropped into file uploader but not saved by user = Pending, and file saved by user = Complete. For instance if a user drags a file into a file uploader then navigates away, that normally orphans the file in the file manager. When the user saves their files (by clicking a button), they make changes to the file uploader’s Files and change all of their status to Complete. If they don’t do this, we know they aborted their upload so can delete them later.

Sorry not sure what you mean :grimacing: Suppose we have a list of arbitrary IDs A / B / C / D which is the file path. ABCD are folders. The path parameter in the URL is A,B,C,D. The repeating group that displays ABCD (as opposed to just their IDs) is Do a search for Files where unique ID is in A,B,C,D. When a folder is clicked, we remove any Folders after that folder from the page parameter. So, if I click C, we’ll remove D from the parameter. If I click B, we remove C and D (this uses items until # in Bubble logic). The user can go straight from D to A without any issues.

Yep that’s a good clarification. Gets me thinking about how I might approach ordering the breadcrumb trail correctly without relying on created time though!

Re. search, that’s perfectly compatible with my approach. The textContent field on my Files are actually for the text extraction of uploaded files so that they can be searched by text (as can folders).

I got it on the approval process.

the path UIDs question was: when searching and a user clicks on a file that is 4 paths deep, how would you go about constructing the paths in that scenario? Although, looking around on Windows Explorer, I see that they don’t show the breadcrumbs for a search result, but they do let you go up a level.

So I understand the breadcrumbs and that the use case works for children on the file, but in a general use case where the children list can be unwieldy, and I’d modify your approach as follows:

  • Remove children from File data set
  • Add the hierarchy level field to the File data type.

NOTE: Even in your use case, adding the levels makes deleting and moving much easier. point_down:

When a user clicks on a folder, retrieve all records in a hidden RG needed for the next level (in use cases where there’s a lot of files and lots of jumping back and forth).
E.g., if the selected folder is on level 1, then all records where parent = clicked record and hierarchy level = current level + 1. And add the record to a separate list of folders that indicate which files have been downloaded. The hidden RG can continue to grow without redownloading folders, and within the RG, the parent and hierarchy make for quick filtering and indexing.

Adding the hierarchies:

  1. helps avoid sorting by created date (in breadcrumb RG) and
  2. makes the deleting WFs more performant and simpler to maintain and debug; and
  3. allows for the simple moving of directories across different folders and levels of the file structure.

The workflow to move folders is the same as deleting folders. No that’s not a mistake! :wink:
For move, you send the hierarchy change based on the parent folder change: find all children and change their levels: find grandchildren, etc.
Same for deleting; just update hierarchy to -1 and once you do finish cycling thru, send all -1 files for deletion (or wait a bit so you can offer a rollback but user still won’t see files he doesn’t want to see once the parent level is changed.)

1 Like

Some other observations:

  1. Why are you using UIDs for breadcrumbs in a path parameter? What about a custom state called breadcrumbs with a list of Files?
    Searchwise, this removes sorting issues, the need to retrieve and splitby, or having to find a whole list of contains (it can be a dozen subdirectories deep!)

Breadcrumbs-wise, a click on any level of the RG doesn’t have to determine where to cut off the path based on index. Instead, it just sets the CS breadcrumbs = CS breadcrumbs :filtered where level >= the level of the clicked file.

Even if you want the url parameters for whatever reason, I’d have them feed the CS.

  1. Any particular reason you are using Wasabi instead of S3 (thru bubble or directly)?
  2. I get positive feedback allowing for dynamic colors and icons for files (usually only appears for uploadedBy user).
  3. when is uploadedBy not createdBy?

These are all good suggestions.

You just loop through D’s parent → C’s parent → B’s parent → A’s parent to identify the paths and then sort.

This is a nice optimisation I might see if I can implement.

So that if the user refreshes the page they’re taken to the exact same place in the app as they were before.

Price and hearing good things on the forum.

I never trust the Creator / createdBy field as I can’t keep track of who owns which backend workflows etc. I always explicitly define who owns something as a matter of habit across all apps I work on!

True. Actually looked back at some files setups in Bubble and reminded myself that when a search is made, instead of breadcrumbs I provide the option to see in folder, return to search or navigate to next or previous item in search (which native file managers often don’t have and is super annoying to go back and forth when looking for a document)

I’d think those 2 fields should be morphed into type with both OSs and then a File Type Text as well for sorting and the like.

@georgecollier I know that refresh and URL parameters are common themes on the forum and in Bubble dev. But it’s overemphasized when in fact is very use-case-specific (that should be its own Tip post).
Here, you are sacrificing primary-use functionality for edge cases when it’s not necessary to pick one over the other. Use the URL params as a backup for refresh, but allow navigation to be quicker and smoother.

Also, by not “penalizing” the user to refresh the page, you are inadvertently nudging them to do so, which can cause a WU stranglehold!

Note: Users may in fact be refreshing to get to root directory (despite the home breadcrumb). I’d give them a popup or some option on refresh for that.

I was thinking about it for a bit and was wondering what you did when displaying the file contents itself?

Resetting the RG every time a user moves up a level and then back down borders on dev malpractice (sorry to be blunt), given the performance impact, unnecessary WU consumption, and the fact that the solution only takes 20 minutes to implement.

Folder’s children.

Hardly. The performance impact is minimal, and caching the files on the page (the parents and children) still results in the same number of searches/gets because every time a user moves you’re still getting a new list - it’s just for the child/parent folder rather than the current folder. All the repeating group displays is Loaded File’s children. Could I WU optimise the heck out of it and keep all of the files on the client side and in the repeating group display Files:filtered and only update the Files when a user accesses a file they haven’t visited before? Yes. Will I? No, because that list will get very long very quickly, so the performance effect from filtering a list every time they move will be just as slow as the database query to get the new folder’s children.

Why? that’s the point, to NOT get a new list if it was already downloaded.

I am not sure I understand, because if you are still getting a new list every time, it will not get too long. But I stress tested both approaches and found that until the hidden RG exceeded 4k+, it was significantly faster. To be sure, nesting all the children on the record may make the DB query faster, reducing the performance gain, but I am pretty sure it is still faster, not to mention cheaper.

How about on your phone? How about when each File has a text field with the file’s text content? I’d much rather keep as much data on the server as possible, rather than the browser.

The performance effect of filtering from records already downloaded is nowhere close to using a DB query to grab the records. The query is straightforward; the performance impact of retrieving the files is much higher (not to mention WU consumption).

That’s in general for searching through records already downloaded. In your specific use case, where the nested files are in the parent folder record, how can you compare the performance impact from grabbing those records from the server to displaying the relevant records that are already here?

You ask EVEN on mobile. I’d say ESPECIALLY on mobile, with its weak and intermittent data connection. Again, finding the records that are present isn’t impacted by it being on a browser, as long as you don’t overload the storage.

NOTE: You can easily remove folders on a FIFO basis if you are approaching browser storage limits.

In that context you have a point. In the tips category I’d think that more general use cases should be the primary focus.

You are raising an interesting point. I’m not sure that the text content field should be in the file record. The inability to cache results itself is enough of a reason to put the content on its own dataset and it also adds major weight to a record type that’s constantly being downloaded (files) for a field you dont need on to be downloaded to the client (content). If the primary source of finding files was searching that may be different, but having content in the file itself creates race conditions (when extracting text) that are unnecessary

Hi George, could you walk me through how you have your repeating group set up? I have the data types in the way you laid out here, but I’m struggling with putting it into practice.

What part? Share screenshots of what you’ve got so far :slight_smile:

Screenshots are attached. I’m struggling to figure out how to structure the repeating group so that when a folder is clicked, the page shows it’s contents.

The simplest way would be to add a workflow on Group name and icon: Display list in Repeating Group: Current cell’s Child Files (or Do a search for Files where Parent File = Current cell’s file).

This will set the data sourc of the RG to the child files of the folder that is clicked.

I’ve done that and it’s working, now the only issue is I can’t figure out how to account for actual files in the database. Since Folder and File are under the same category - where do files get uploaded, and how are they shown?

I think I found a fix - I created a different data type for “Review Files” then linked the data type to my “Folder”. I made a conditional statement based on the File Type (File, or Folder) so that each item shows up properly.

It’s working for images, now just need to run some tests on video.

If you have any other ideas - happy to hear them!