The most interesting aspect about
the identification of similar subsections in files in Peer-to-Peer networks is the use of a combination of size and conceptual markers plus hashing that is called Rabin fingerprinting.
That's clever and a useful trick to know. The rest (including the hashing, actually) is pretty much how anyone would have implemented it.
That it causes an improvement is a sad statement about the unequal distribution of network power in the current world.