Forget Folders

Sep 07, 2007 15:55

I twittered this, but it certainly requires explanation. There's a problem with file systems- they treat Folders and Files as separate kinds of object. This is a mistake, and will soon be a bottleneck on innovation. We need a new alternative.

OSX has taken a step in the right direction with the concept of the "package". Applications, iWork documents, and certain iLife documents are not individual files- they're actually folders that bundle up a bunch of related objects. So, for example, if I want to distribute an application that uses sounds and images, etc. on OSX, the user will "see" one file (from the perspective of the file browser). So, Firefox- for example, appears to the user as a single file, labeled Firefox.

A more savvy user, might view the file via the command line. They'd see it listed as Firefox.app/- a folder. The file browser and the terminal both allow you to browse into that folder- revealing a structured object that represents an application. All of the supporting files are contained in that folder, but it appears to the user as a single file.

This is a "contains" relationship. A single "file"- the application has a "contains" relationship with all the components.

But this is still operating in the realm of a traditional file system.

I'm thinking of something very different. In my ideal filesystem, there are no folders. There are two kinds of objects, nodes and relationships. A "node" is more like what we conventionally think of as files and folders. It is an entity that is uniquely identified and has data associated with it. A "relationship" associates two nodes (or perhaps n-nodes) in a meaningful fashion.

So, for example, if I had a document and wanted to insert a picture into it, I could express that as a "contains" relationship- the document "contains" the image. But what if that document references some information in a spreadsheet? Why, that would be a "references" relationship. But there are different kinds of references.

For example, there's a "depends" reference- if the document "depends" on the spreadsheet, then the spreadsheet can't me removed from the filesystem without the document becoming corrupt. A special case of the "depends" relationship could be the "link" relationship- and in this case, I'm referring to a "compiler linkage"- ie. a programmatic dependency.

Another possible link could be a "metadata" link- the image somehow describes the document. Or a "parent-child" relationship, which says that one file somehow inherits (the child) properties from another file (the parent).

There's definitely performance issues with this sort of filesystem layout. There's a reason filesystems are designed the way they are- they're very efficient for read/write operations. They all descend from FSes designed in an era where throughput really mattered- and it still does.

But we're approaching a point where we have solid state storage devices. This sort of high-level filesystem could run on top of a low level FS without a significant performance hit- when you move towards solid state (compared to current rotational storage systems). There are some other advantages too- this would expand very well across a distributed filesystem. It could run on top of HTTP even (HTML sort of supports this via and "rel" features).

I'm sure this wasn't the first time something like this has been suggested.

programming, operating systems

Previous post Next post
Up