Files and directories

1 Sept

Written By James

The structure of storage in computers is based on the analogy of real-world files and folders.

I have been thinking about computer storage from the point of view of what programs usually need and want from it, mixed in with my own ideals for storage. Many others have written about the limitations of files and directories for storage, so I won’t go into great detail on the problems. Instead I’ll focus on what properties may be useful for programs. Files and directories feel to me to be straight forward for people to understand and relatively easy for developers to make. These are great properties, and have taken us quite far. But from a program’s perspective, what are the useful properties of storage? Perhaps the ideal for programs would be transparent persistence of all program state. In other words, having all memory be persistent without consideration for RAM, HDD or internet storage. While technically possible, and already accomplished by some languages, my gut tells me the scalability isn’t there. Games in particular would be too slow, they move and transform too much data as they run their real-time simulations. So, taking a step back from complete abstraction of storage, again what’s useful?

Programs change over time and so does the storage associated with them. This becomes even more complicated when different programs want to read or change something stored. For example, a new version of an image editing program alters an image, changing the meaning of a field in the data. A game that uses this image hasn’t yet been updated to handle this change. What should happen in this situation? The game could consider reading the image a failure and stop running or substitute a fallback image so the game still runs. This situation is often seen with software as it is today. Another possibility is that the game could read the image as it was before the breaking change. This could be possible if storage was immutable and data was only ever added. This is similar to version control systems, nothing is ever lost. There are plenty of arguments against this, such as the complexity only being moved from programs down into the storage system, not removed. Storage would probably be slower than a mutable storage system, it’s a trade-off for the desirable properties. This may pose another issue, not all programs require a full history of storage, but they still pay for the overhead. Taken to the extreme, hypertext would benefit from computers using such storage. If data persists on the Internet too, then links would almost never break.

Programs also need a representation of stored data that they can read and change. Currently each program creates its own solution for this as files are simply a stream of bytes. Often structure is given with XML files being transferred between storage and the language’s in-memory structure. This puts up barriers between programs, adds extra work and complexity and often leads to programs not persisting as much as they could. A universal structure of storage is therefore another useful property for storage. Relational databases already provide structure, but they have many drawbacks and are not easy for programs to work with. Programming languages don’t map well to relational data (object-relational mapping is a nightmare in my opinion.) It would be interesting to experiment with programming languages having native support for relational data storage. Also, the data and structures in relational databases are mutable, going against the previously discussed useful storage property of immutability. As with the immutable storage, to achieve universal structure the storage system becomes more complex and programs less so.

Putting these two properties together, programs could be very different. With structured storage being part of a programming language, instead of saves and loads, we may have something like transactions from relational databases. This should make incremental changes in storage very easy. I hope that this could make automatic and continual saving more common for programs, ending the need for explicit saves and making the experience of using programs very smooth. Together, these properties may make the problem of losing work a thing of the past.

James

Files and directories

Operating systems