It is conjuring from the ether, it is wishing things into your hands, it is just saying its name and it will appear. It is pure magic. And it may become a very important part of our future– in the Decentralized Web and beyond.
“The great thing of the web is that now knowledge has an address,” said Peter Lyman, the University Librarian of UC Berkeley 20 years ago of the URL, which means that people can build easily on other’s knowledge. Now we can add something: “Content addressibility means that knowledge has a name.” A name can be better than an address because addresses sometimes become obsolete. (Peter Lyman was one of the first board members of the Internet Archive and, I believe, came up with the term “Born-Digital” to describe materials had never been printed– a new thing in 1996).
What am I talking about? This might sound to simple to stand up these big claims, but bear with me. This is one of the big things I have learned from the Decentralized Web work.
Content Addressing starts by processing a digital file into a “hash” which is roughly 64Bytes, or 64 character long string of numbers (using sha256). This hash is has amazing properties– given a hash you can confirm that a digital file matches it, further given a hash it is very very difficult to create the digital file. And, here is the kicker, given a hash it is almost impossible to create a second digital file that matches it, but was not exactly the same as the original.
Therefore, a “hash” is a name for a file in the sense that if you have a hash you are looking for, and someone hands you a file, you can confirm it, and you do not have to trust who gave it to you– they can not fake or counterfeit the file. The file either has the same hash or not.
That the hash is very short, like 64 characters, and can name a multigigabyte file means that moving around these hashes is very efficient. The Internet Archive has 17 petabytes of web data, but all of the hashes are only 22terabytes. Therefore to give every web object a unique name, it only takes 0.1% of the size.
So, with a hash, one can address content directly, ask for it by name, and confirm if what you are given matches. The most common application of this is in the BitTorrent system, but it is widely used. In bittorrent, one can start with a “magnet link”, which is a hash, and asked the “decentralized hash table” DHT, and it will help you retrieve the file that matches that hash, in this case a “torrent” file. A torrent file, in turn, contains a list of hashes of pieces of files that can then be retrieved through the bittorrent protocol, and after this magic is done, then you will have a set of files on our hard drive that came from 10s or thousands of others all over the net.
Therefore, if there are others on the peer-to-peer network that are serving files, and you have a hash, then you can ask the network to give you that matching file or piece of a file, and there can be no counterfeiting.
Why this can be important that materials can be served from many places, served from libraries and archives, and keep permanently available long after the original server is gone. I think of it as a way to have the same book be in many libraries, and even if the publisher goes away, and several of the libraries merge, you still have a chance to get the book. This is different from a website, where if the website goes away, you are either out of luck or if something like the Wayback Machine has a copy, you are saved, but you have to trust us. So in a way, this hash idea is bringing back some nice features back from the printed era. A much more reliable system of digital publishing is possible in this way.
(This is how IPFS, Zeronet, DAT, and just about every decentralized system works, but I think still it is under-appreciated magic. Next miracle I will describe is how cryptographically signed files can bring us the next step: updatable digital files that are served from everywhere and nowhere.)