BitCondenser !!!

May 25, 2005 21:23

I just submitted a project application to SourceForge for BitCondenser.

I'm including the project description below. I'm doing this now because I'm going to be releasing BitCondenser with the GNU GPL for a license, so it's not like I consider this "intellectual property" or anything like that. The idea of "IP" to me is a bogus one.

Anyway, here it is, albeit long:

BitCondenser is a Next-Generation file "compression" program. BC accomplishes its "compression" by mathematically "describing" the structure of a file's bits. BC only "compresses" one file at a time. With tar, for instance, BC may act as the ultimate archiving tool.

BitCondenser shall accomplish the goal of chopping a 40MB executable into a 2KB text file by taking the input of a file in binary mode, converting the binary data into its decimal equivalent, approximating which power of 2 this decimal value is closest to, and recording the exponent and the offset of the value.

For example, the binary number 1111111111111111111111111111111111111111111111111111111111111111, while large, is nothing compared to the binary output of even a small PNG image one may find on the Web. Instead of typing out all of those ones--which equate to 18,446,744,073,709,551,615 in decimal--it would be not only easier but less space-consuming to say this is 2^64 + -1.

So, if there exists a binary file that has a value of 1111111111111111111111111111111111111111111111111111111111111111, the file could be reconstructed with two small numbers: 64 and -1

BitCondenser shall initially be written with C for the GNU/Linux platform. This is because C is not object-oriented, and the solution must be compatible with other procedural languages. Once a procedural solution is devised, then (an) object-oriented solution(s) may follow. Ultimately, however, the solution must be cross-platform and cross-language.

This project may utilize the following libraries, as long as doing so would not violate the project's intended license:
  • BIG_INT
  • MD5
The major obstacle to overcome will be a memory issue. Even a 64-bit integer will not be large enough to handle the amount of data that will be "condensed". Once a solution is hacked together to solve this issue, then the next issue will be one of performance. Examining a 40MB file bit-by-bit may be taxing on some older machines.

Some potential uses of this technology include, but aren't limited to:
  • Replacement of StuffIt, Winzip, 7-zip, *
  • Piggy-back with PGP/GPG for the ultimate in secure, compressed file sharing over any network.
  • P2P (Gnutella/BitTorrent/*, Napster/iTunes/MSN Music/*, Official Pay-to-Play ROM Services, *)
  • "Virtual Machine" interface--read, write, execute files from saved file in "real time". Saved file gets updated "on the fly" (actually, replaced once new message digest is verified), files stored inside saved file get expanded to a "temp" directory--like a virtual cache. This shall be accomplished with something such as tar for file aggregation.

I hope this gets approved. I have high hopes for this project.

free software, geeky, programming

Previous post Next post
Up