Not Yer Grandma's Floppy Ds

Apr 01, 2022 03:41

If you think of a given instance of data storage as a model itself-a hyper-customized, overly-specified one that is only useful for running exactly one scenario-then it becomes apparent that, on very large scales, it would drastically reduce data storage requirements in the aggregate if most data files presently requiring dedicated data storage (and often needing to be redundized as well [sic; there doesn't seem to be a common-usage word form for this concept; other candidates include "redundified" and "redundated"]) were instead either dynamically generable rather than needing to be manually declared (i.e., simulable), and/or were definable from general variables rather than requiring situationally unique ones (i.e., inferable).

In other words, if we had, say, a truly powerful working model of Human Civilization, the data storage requirements for that model would be enormous, but we wouldn't need, say, to store bajilliabytes worth of individual YouTube videos at Google's countless data storage centers: With the proper control systems, we could instead create them on demand. As described above, I can think of two mechanisms for this to work, which are really just two different approaches to the same basic solution: The former mechanism, dynamic generation, depends on having a more powerful, underlying model that transcends the context of any given data file and is capable of generating numerous daughter files; e.g., if you had a model to generate diatonic chord progressions, you wouldn't need to manually declare all the permutations, nor would you need to store those permutations in order to access them again in the future (which is my point). (This is sort of like the difference between raster and vector graphics, the latter requiring a bigger initial setup but generally generating smaller individual file sizes for graphics and also being able to manipulate these files in major ways almost effortlessly.) The latter mechanism, meanwhile, of using "general variables" to "infer" specific YouTube videos, is comparable to knowing the properties and behaviors of foods and cookware, which is a large data set aye but far smaller than "The Great Big Cookbook of All Existing Recipes."

For extremely popular YouTube videos (or any stored data files; I'm just using YouTube as an example), it would probably make more sense to maintain dedicated data storage, because if you have a need to run the exact same scenario over and over again, well, then, at that point it makes sense to clear space in your kitchen drawers for that tool. (This is why virtually everybody has a knife in their kitchen but few people have, say, a strawberry corer (to the extent we can define either of those objects as discrete things, which is obviously an oversimplification).)

So, what's my point here? Well, as I hinted, I am concerned about the long-term data storage requirements of our civilization, and also the frailty of static data storage-as has been amply and painfully demonstrated in the past 50+ years. And while we clearly have a long way to go to be able to access specific stored data files on demand without having prior access to their explicit contents, I think it is possible to some degree. It wouldn't be all that hard, for instance, to use this hocus pocus to change the shirt colors or designs of people in YouTube videos. Indeed, we already have dynamic controls for whole-image brightness and color correction (not in YouTube but in some other video viewers and editors). I wonder sometimes about the wasted data when (human) video editors edit their raw video files but don't feed those modifications into some database for future reference and/or statistical complication and/or AI training. So if we extend this idea, I can see a world where we have, say, an emulation of a given music YouTuber perform a rendition of a given song, without the YouTuber needing to actually create that video.

But the more interesting use-case for me is not to replace humans (which is pointless) but instead to store real-world data. Let's say the YouTube has created a video of them performing a specific song. Suppose it were possible to take few or possibly even zero elements from that data file, plug them into a larger model, and have it spit out the complete video on its own, simply because it knows the Universe so well? The major hurdle strikes me as being the closing of the gap between "close enough" and "exact." Most color and meaning in the world is lost in that gap; this is why most procedurally-generated content is so empty of personality.

We already have some precedent for indirectly deriving exact or near-exact solutions in human memory-not only in conspicuous techniques such as mnemonics (e.g., "I before E, except after C, unless it sounds like A...") but in far more general applications too. For instance, those of us who know how to drive can drive on roads we've never been on before, either by studying the map ahead of time (which you might say is a form of simulation where the simulation occurs prior to the execution) or by inferring the navigation of the road based on our prior experience and knowledge (e.g., "the road curves here, so I need to turn to stay with it"). Indeed, this is what self-driving cars do.

Therefore, let us consider the possibility of transcending one-for-one data storage and contemplate its useful, viable applications.

P.S. Apropos the title, someone mentioned recently that their grandma grew up using floppy disks, and I was shook.
Previous post Next post
Up