Collector or Digital Librarian?

Do you think more often of your date of birth or date of death?
Do you think of the start or the end?
Do you think of the project or the deadline?
Do you think of your journey or your legacy?
How much substance is there in your soul verses value in your impact on others?

I imagine these are the differences between the Collector and the Digital Librarian. The collector seems to want to experience, to learn and also to share. The collector discovers, understands and reveals to others.

The digital librarian has no long term memory other than what is captured in the library. Where the collector lives for life’s expressive expanses, a digital librarian is designing for not-being-here-anymore. When a doctor says, as the end nears, “you should get your affairs in order…” it is gloomy, foreboding, and tragic to the collector. The Digital Librarian says, “That is all I ever do.”

It is not that the Digital Librarian does not want to live forever—in fact that may be the driving emotion—it is just the method to live forever is not corporal, it is informational. We fight mortality through trying to share, and share permanently.

In creating the Great 78 Project, I have wanted to keep the notes of what records were in whose collections. I believe this may be the most important thing—more important than the recordings—what records were together?

If we want to understand a time or a life, it is made up of those groupings. As a Digital Librarian I want to illuminate for others those lives, those perspectives — I want to not lose those past lives through reorganization. But I don’t think I will be the one to learn from these lives, those choices, those perspectives. It will be other people, or even machines that will learn from these assemblies.

Bill Dunn said in the mid 1980s, “The metadata is more important than the data itself.” Astonishing—how did he know? He came up with the term “metadata” with Mitch Kapor around that time.

Collections are metadata and metadata of great value if these reflect a life’s choices. Those life’s choices may be the most valuable part of the Great 78 collection.

As a Digital Librarian, I feel I should, I must preserve this, share this.

But it is not for me, it passes through me. I am a Digital Librarian, not a collector.

I hope I do a good job during my brief stay on this earth.

Posted in Uncategorized | 1 Comment

The Great 78 Project

Announced a month ago with 15,000 digitized “sides” of 78 RPM records, I am trying to understand it and to understand myself through it. B George of the Archive of Contemporary music is also the Internet Archive’s music curator but he does not know as much about 78s because he specializes in LPs — mostly from the 1980s on. So we have donated collections and he is going through them and picking large sections to be digitized. The slant on our contribution of recordings to this project really comes from the collections that have come in.

Barrie Thorpe collected and donated to the Batavia Public Library in Illinois decades go, we have that collection of maybe 40,000 records. “Tercat” is from Rhode Island. A collection of polka, others…

I am interested in keeping the collections together because I think that is where much of the value is, but I am not sure. Maybe these are not original collections but remixes by the next generation.

I want to know what the early 20th century sounded like. Midwest, different countries, different social classes, different immigrant communities and their loves and fears. I am not looking for the great record, the unfound gem, I don’t think. At this point I am looking for ‘discovery’, for inspiration, for leads I can follow up on, maybe I am looking for a rationale for spending time and money digitizing this stuff.

As a librarian, I see justification through others — is it useful? Did someone say “thank you” or “I love what you do”? But what is this collection or collection of collections? “Selection” is easily muted if you go for “comprehensive” — an easy out.

But what is this? How about a “modern discography”? A reference collection that is more than a listing in a book of the releases of a particular label — more than one that has pictures of the labels. This one has the digitized sound recordings.

This “reference collection” moniker works because we do not really care if we have the physical disc (though that is nice because we can go back to it to re-record or study it, or at least know it is safe). What we want is it to be findable with a click.

For instance, I am reading a book called, “Do Not Sell at Any Price: The Wild, Obsessive Hunt for the World’s Rarest 78rpm Records” by Amanda Petrusich. In it, she talks of many records, performers, labels, collectors, re-issues. I want to click on the paper page and hear it. I want citations to turn into blue links so I can more deeply understand what she is talking about. Is this weird? I don’t think so. Footnotes were always supposed to be hyperlinks, we just did not have the tech with paper. But now we have ebooks — let’s bring the citations to life. To do this we need a reference collection, ta-da.

But there is something more for me personally — I get a thrill as the materials come online. Each disk is a revelation. It feels like when I was in college and I would buy a used record for $3 and bring it home and play it — “retail therapy.” Ownership and discovery and possession. We just made a Twitter feed out of the newly available digitized 78s… we will see if anyone else likes it.

I also like playing a list of 78s based on a search — all hillbilly, all yodeling, all by a performer. It is less curated than a compilation LP or CD. It is serendipitous. It is kind of random. It has discovery feelings. Maybe if it is a new way to find things more like how Youtube goes from one video to the next. A new thing?

So what is this? A reference collection? A collector’s dream? A discovery radio station? The soundtrack of the early 20th century? All Good. All Fun.

All told, I would say the Great 78 Project is building a reference collection.

Posted in Uncategorized | Leave a comment

Upgraded Secure Communications Applications I am Now Using

I am upgrading the security of my communications while still being easy to use. I thought I would share what I currently use in case it is helpful to copy and I would appreciate comments.

I want end-to-end encryption so nobody can intercept what I am saying (unless they have infected my phone or computer, but that is another issue), and bonus points for making it so that it is unknown who I am communicating with and when (private metadata and traffic). Skype, phonecalls, sms/texts, slack and email are now known to not be private (at least by default) thanks to Edward Snowden. This is too bad since I still use these.  (Slack is not end-to-end encrypted even for direct messages, which it could and should.) So far I have only partially achieved the first step: end-to-end encryption. I am migrating to:

  • Signal for point-to-point instant messaging replacing sms and skype. Free software, free of cost, and open source, works on smart phones, and with a chrome-based desktop Signal app on my Mac (which is what I mostly use).   It uses phone numbers as identifiers, which is kind of a pain.  EFF friend called this “best of breed” for security. Small development staff. I have donated.
  • for 1-on-1 and small group video chat that is end-to-end encrypted replacing Skype. This does not require a download or an account. Go to the homepage, type a bunch of characters to make a meeting room, then send the resulting url to someone and they can use that throw-away meeting room.  Super easy. Uses webrtc (now standard in browsers), and https with it, they say it is end-to-end encrypted.   They have a iphone app as well, but don’t know about security. This does not seemed designed for super high security, but seems to be pretty good.
  • for larger group video chats replacing Webex. Free of cost for most of my uses, easy to use (requires download, but is super easy) . It says it is end-to-end encrypted with a little lock icon when in use and encrypted.
  • Facetime occasionally on my iphone replacing cellphone calls to friends with an iphone.  Apple says that it is end-to-end encrypted.
  • Thunderbird + Enigmail to sign all email, receive encrypted email, and sometimes sending encypted Email, with an organizational email server ( not gmail).  Enigmail is moderately hard to set up, I had help in a meetup.  Cost free, and I believe free and open source software. I am donating.
  • encrypted notes file (the mac Notes app) on my mac for high priority secure notes. It syncs the encrypted file with my iphone via icloud.
  • Breadwallet, bitcoin wallet on my iphone, for small amounts of bitcoin for casual purchases.  Super easy and a full wallet (does not hang off a server). Love this wallet. Cost free.  I invested a tiny amount of money in the company– great guys.
  • Torbrowser for private web browsing beyond Firefox’s Private browsing feature.  Free and open source software, cost free. I have donated.

Any comments or ideas are welcome. I realize have traded off security for ease of use.  I hope stronger tools get easier and I suggest we all invest in tools based on donations and development help.  I wish I knew my mac and iphone were not compromised.  Not sure how to do that.

I have tried ricochet as an instant messaging client that secures who I am talking to via Tor, easy to use, but few I know use it, so I don’t use it often.  I have tried encrypting my email using pgp via enigmail but have run into trouble with others being able to read it, so I do not encrypt email by default. As an aside, encryption is related in a funny way to content-addressible systems, which is a different subject, but this is magic and the future.

—– From a commenter: —–

Web search:  DuckDuckGo or     (thank you, Reinout)

Posted in Uncategorized | 7 Comments

Using for Supplies Globally– uh, Not Yet

amazon-unhappyI thought I was so clever– instead of buying all the parts for our books scanners from industrial vendors then assembling&testing&reshipping from Richmond California, we could send many parts to where they were needed and assemble in our scanning centers.  We wanted to buy all the parts through, all on one amazon business account, which would make it easy to track.  We thought we could try this for a scanning center in Hong Kong.

Well, here are the problems we encountered:
1) the prices for cameras and lenses were 30% more than if we bought from selected stores we could bargain with,
2) some vendors would not ship internationally,
3) most electronics vendors had limits on how many you could buy, like 2 or 4 or 5. What a pain. Even “amazon basics” were limited,
4) a couple of the vendors only had a couple in stock,
5) shipping to Hong Kong means deliveries take at least 9-10 days because they come from the US,
6) shipping to Hong Kong is expensive because there is no Prime or free shipping or local delivery.

All in all it took hours after we had figured out which ones we wanted.  (thank you Salem!)

So we ended up making many sub-accounts for our business account to get around some limits, have to wait for restocking from others, and shifting around vendors. On the positive side, we could use our business credit card to get a kickback in miles, and we also used to try to get a kickback to Internet Archive.   We should have used the affiliate code, which would might have given us a bit more of a kickback.

Oh, well. is meant for individual consumers– and for that it works well.   For this application, not so much.


Posted in Uncategorized | Comments Off on Using for Supplies Globally– uh, Not Yet

Content Addressing is Magic

It is conjuring from the ether, it is wishing things into your hands, it is just saying its name and it will appear. It is pure magic. And it may become a very important part of our future– in the Decentralized Web and beyond.

“The great thing of the web is that now knowledge has an address,” said Peter Lyman, the University Librarian of UC Berkeley 20 years ago of the URL, which means that people can build easily on other’s knowledge. Now we can add something: “Content addressibility means that knowledge has a name.” A name can be better than an address because addresses sometimes become obsolete. (Peter Lyman was one of the first board members of the Internet Archive and, I believe, came up with the term “Born-Digital” to describe materials had never been printed– a new thing in 1996).

What am I talking about? This might sound to simple to stand up these big claims, but bear with me. This is one of the big things I have learned from the Decentralized Web work.

Content Addressing starts by processing a digital file into a “hash” which is roughly 64Bytes, or 64 character long string of numbers (using sha256). This hash is has amazing properties– given a hash you can confirm that a digital file matches it, further given a hash it is very very difficult to create the digital file. And, here is the kicker, given a hash it is almost impossible to create a second digital file that matches it, but was not exactly the same as the original.

Therefore, a “hash” is a name for a file in the sense that if you have a hash you are looking for, and someone hands you a file, you can confirm it, and you do not have to trust who gave it to you– they can not fake or counterfeit the file. The file either has the same hash or not.

That the hash is very short, like 64 characters, and can name a multigigabyte file means that moving around these hashes is very efficient. The Internet Archive has 17 petabytes of web data, but all of the hashes are only 22terabytes. Therefore to give every web object a unique name, it only takes 0.1% of the size.

So, with a hash, one can address content directly, ask for it by name, and confirm if what you are given matches. The most common application of this is in the BitTorrent system, but it is widely used. In bittorrent, one can start with a “magnet link”, which is a hash, and asked the “decentralized hash table” DHT, and it will help you retrieve the file that matches that hash, in this case a “torrent” file. A torrent file, in turn, contains a list of hashes of pieces of files that can then be retrieved through the bittorrent protocol, and after this magic is done, then you will have a set of files on our hard drive that came from 10s or thousands of others all over the net.

Therefore, if there are others on the peer-to-peer network that are serving files, and you have a hash, then you can ask the network to give you that matching file or piece of a file, and there can be no counterfeiting.


Why this can be important that materials can be served from many places, served from libraries and archives, and keep permanently available long after the original server is gone. I think of it as a way to have the same book be in many libraries, and even if the publisher goes away, and several of the libraries merge, you still have a chance to get the book. This is different from a website, where if the website goes away, you are either out of luck or if something like the Wayback Machine has a copy, you are saved, but you have to trust us. So in a way, this hash idea is bringing back some nice features back from the printed era. A much more reliable system of digital publishing is possible in this way.

(This is how IPFS, Zeronet, DAT, and just about every decentralized system works, but I think still it is under-appreciated magic. Next miracle I will describe is how cryptographically signed files can bring us the next step: updatable digital files that are served from everywhere and nowhere.)

Posted in Uncategorized | 4 Comments

Successful German Model for Permanently Tenant-Friendly Housing

Started in 1989 and now made up of 107 apartment buildings in Germany, a network of housing projects called Mietshäuser Syndikat, have locked themselves into a structure that allows almost complete autonomy for each housing project, with a couple of exceptions: the building can not be sold or condo-ized without the permission of a central organization.

This form of “some rights reserved” housing provides more security for the tenants from market-based rent fluctuations and evictions.    It also provides a structure and advice network for those wanting to build tenant housing associations that will own and control their buildings over a long term.

In the United States, I have seen people drawn to the urban land trust structure for some of these advantages.   This German model has the advantage of a fixed, and low, fee structure (2.5 cents per square foot per month) and a transparent and very limited set of rights reserved for the central entity.   In this way, the central entity is deliberately limited to a small function to keep autonomy in the housing projects themselves.

I have been interested in restricting new debt being put a member building, but that is a step further than this model does.   In the case of Foundation Housing, the benefits are allocated to non-profit use, either by accumulating assets or lower-than-market usage fees.    So this German system is a step in these directions, and given their rapid uptake, a successful step.

I hope this ideas spreads!

Thank you to Ben Woosley for pointing this out to me.


The joint venture



Posted in Housing, Uncategorized | 2 Comments

Custom School Short Video Description and Short Video by Kid having done it



And, for more good news, Logan got into Cornell a year earlier than his years would suggest, and he is doing well there.   So custom schooling has been working well.

Posted in Uncategorized | Comments Off on Custom School Short Video Description and Short Video by Kid having done it

Paper on systems like Foundation Housing

Housing for non-profit workers does not seem to be represented in this paper, but union housing, or permanently affordable housing is.   And hence, I suggest worth reading.

The paper from 1996 usefully describes experiences with Community Land Trusts in urban areas and Mutual Homes Association (which I have loved since discovering Richmond’s successful Atchison Village). 

This paper brings context to those working on a “third sector” (not public housing, or for-profit housing).

I liked this paper because it gave real examples of success and failures and tried to draw conclusions and trends.  Unfortunately, most of the groups they studied were under 10 years old.   I hope they update this study, or if anyone knows of an update, I would like to see it.    Unvarnished experience is valuable.




Posted in Housing | 1 Comment

Divertissement for Warming Orchestra #D4

Having just gone to the symphony tonight, I would like to propose a new piece of music, called Divertissement for Warming Orchestra.

Here is the “score”: It is a form of call-and-response.  When the orchestra is warming up, any player plays a short segment of a familiar tune.  Then someone else in the orchestra responds, maybe with the next part, maybe something as a riff.  For instance a small part of twinkle twinkle, or Alice’s Restaurant, or Gilligan’s Island theme, or Star Wars, or …

The little back and forth can go on for no longer than 30 seconds, and not be obvious.  It has to just tickle the ear, fire a neuron, and then be gone.  If the conductor looks like she might come on, then it is to stop.  The musicians are not to show any indication they are doing this, so this piece is to be heard but not seen.

Anonymous, crowd sourced, guerrilla music might just make being in the audience before the performance really fun.

If you believe this piece has been played, maybe tweet about it with hashtag #D4   (as in “Divertissement for”).  If your orchestra is banned from playing this piece, then also tweet #D4.

We will know this piece is successful if it is banned in symphonies in several cities beginning with C.

Posted in Uncategorized | Tagged | Comments Off on Divertissement for Warming Orchestra #D4

Locking the Web Open: A Call for a Decentralized Web

(Short form article, Short lecture, Long lecture, demo of a fraction of the idea of a distributed website (or paste this link in maelstrom))

Over the last 25 years, millions of people have poured creativity and knowledge into the World Wide Web. New features have been added and dramatic flaws have emerged based on the original simple design. I would like to suggest we could now build a new Web on top of the existing Web that secures what we want most out of an expressive communication tool without giving up its inclusiveness. I believe we can do something quite counter-intuitive: We can lock the Web open.

One of my heroes, Larry Lessig, famously said “Code is Law.” The way we code the web will determine the way we live online. So we need to bake our values into our code. Freedom of expression needs to be baked into our code. Privacy should be baked into our code. Universal access to all knowledge. But right now, those values are not embedded in the Web.
It turns out that the World Wide Web is quite fragile. But it is huge. At the Internet Archive we collect one billion pages a week. We now know that Web pages only last about 100 days on average before they change or disappear. They blink on and off in their servers.

02-blockedworldAnd the Web is massively accessible– unless you live in China. The Chinese government has blocked the Internet Archive, the New York Times, and other sites from its citizens. And other countries block their citizens’ access as well every once in a while. So the Web is not reliably accessible.

03-spyingAnd the Web isn’t private. People, corporations, countries can spy on what you are reading. And they do. We now know, thanks to Edward Snowden, that Wikileaks readers were selected for targeting by the National Security Agency and the UK’s equivalent just because those organizations could identify those Web browsers that visited the site and identify the people likely to be using those browsers. In the library world, we know how important it is to protect reader privacy. Rounding people up for the things that they’ve read has a long and dreadful history. So we need a Web that is better than it is now in order to protect reader privacy.

04-spendBut the Web is fun. The Web is so easy to use and inviting that millions of people are putting interesting things online; in many ways pouring a digital representation of their lives into the Web. New features are being invented and added into the technology because one does not need permission to create in this system. All in all, the openness of the Web has led to the participation of many.

05-distributedWe got one of the three things right. But we need a Web that is reliable, a Web that is private, while keeping the Web fun. I believe it is time to take that next step: I believe we can now build a Web reliable, private and fun all at the same time. To get these features, we need to build a “Distributed Web.”

Imagine “Distributed Web” sites that are as easy to setup and use as WordPress blogs, Wikimedia sites, or even Facebook pages, but have these properties. But how? First, a bit about what is meant by a “distributed system.”

06-escherContrast the current Web to the Internet—the network of pipes on top of which the World Wide Web sits. The Internet was designed so that if any one piece goes out, it will still function. If some of the routers that sort and transmit packets are knocked out, then the system is designed to automatically reroute the packets through the working parts of the system. While it is possible to knock out so much that you create a chokepoint in the Internet fabric, for most circumstances it is designed to survive hardware faults and slowdowns. Therefore, the Internet can be described as a “distributed system” because it routes around problems and automatically rebalances loads.

The Web is not distributed in this way. While different websites are located all over the world, in most cases, any particular website has only one physical location. Therefore, if the hardware in that particular location is down then no one can see that website. In this way, the Web is centralized: if someone controls the hardware of a website or the communication line to a website, then they control all the uses of that website.

In this way, the Internet is a truly distributed system, while the Web is not.

Distributed systems are typically more difficult to design than centralized ones. At a recent talk by Vint Cerf, sponsored by the California Academy of Sciences, Cerf said that he spent much of 1974 in an office with two other engineers working on the protocols to support a distributed Internet system, to make it such that there are no central points of control.

07-networkHere’s another way of thinking about distributed systems: take the Amazon Cloud. The Amazon Cloud is made up of computers in datacenters all over the world. The data stored in this cloud can be copied from computer to computer in these different places, avoiding machines that are not working, as well as getting the data closer to users and replicating it as it is increasingly used. This has turned out to be a great idea. What if we could make the next generation Web work like that, but across the entire Internet, like an enormous Amazon Cloud?

In part, it would be based on peer-to-peer technology—a system that isn’t dependent on a central host or the policies of one particular country. In a peer-to-peer model, those who are using the distributed Web are also providing some of the bandwidth and storage to run it.

Instead of one Web server per website we would have many. The more people or organizations that are involved in the distributed Web, the more redundant, safe, and fast it will become.

08-privacyAnd it also needs to be private—so no one knows what you are reading. The bits will be distributed—across the net—so no one can track the readers of a site from a single point or connection. Absolute privacy may be difficult to achieve, but we can make the next Web much more secure.

The next generation Web also needs a distributed authentication system without centralized usernames and passwords. That’s where encryption comes in to provide a robust but private identity system.

We’d also want to bring in some other features if we’re going to redo this Web.

09-memoryThis time the Web should have a memory. We would like to build in a form of versioning, so the Web is archived through time. The Web would no longer exist in a land of the perpetual present.

On library shelves, we have past editions of books, but on the Web, you don’t have past editions of websites. Everyday is a new day, unless you know to use the Internet Archive’s Wayback Machine, which may have copies of previous versions. Where the Wayback Machine was created after-the-fact to solve this problem of the current Web, in this next iteration we can build versions into the basic fabric of the Distributed Web to provide a history and reliability to our growing digital heritage.

We could also add a feature that has long been missing from the Web: easy mechanisms for readers to pay writers. With the support of easy payments, the Distributed Web could evolve richer business models than the current advertising and large-scale ecommerce systems.

Adding redundancy based on distributed copies, storing versions, and a payment system could reinforce the reliability and longevity of a new Web infrastructure.

Plus it needs to be fun—malleable enough to spur the imaginations of millions of inventors. This new Web could be an inviting system that welcomes people to share their stories and ideas, as well as be a technology platform that one can add to and change without having to ask permission– allowing technological change just for the fun of it.

10-componentsHow can we build this new Distributed Web? There have been many advances since the start of the Web in 1992 that will be helpful.

We have computers that are a thousand times faster. We have JavaScript that allows us to run sophisticated code in the browser. So now, many more people can help to build it.

Public key encryption systems were illegal to distribute in the early 90’s, but are now legal, so we can use them for authentication and privacy. With strong cryptography, communications can be made safe in transit and can be signed so that forgery is much more difficult.

We have Block Chain technology that enables the Bitcoin community to have a global database with no central point of control.

And we have virtual currencies such as Bitcoin, which could make micropayments work in a distributed environment. Many other projects have pushed the limits of distributed systems giving us building blocks for a Distributed Web.

11-spaceodysseyI’ve seen each of the necessary pieces work independently, but never pulled together into a new Web.

I suggest we need a bold goal, one that is understandable and achievable. Something that we might be able to rally around, and have multiple groups contribute to, in order to build an easy to use Distributed Web.

What about WordPress, but distributed? WordPress is a very popular toolkit that millions have used to build websites. My blog,, for instance, is built on the open source WordPress software installed on a server at the Internet Archive. Free to use, and free to host, this toolkit enables anyone to select from a set of template designs and modify it to give it a unique look. Then the original creator can appoint users to play roles such as administrator, editor, or commenter. Those with these different privileges can, in turn, grant privileges to others as appropriate. And then the writers can post articles or images to its pages or change the look and feel of the site.

A WordPress website, traditionally, would then be hosted on a computer of the creator’s choice, either on, or on other sites offering hosting, or even on their own computer because the underlying software is available open source as well. This is where WordPress is not “distributed,” in the sense we are talking about earlier. If the organization hosting the site does not like the material, or it is blocked in another country, or goes out of business, then the website will not be available. Even major companies, such as Apple, Google, and Yahoo, have taken down whole systems hosting millions of user’s websites, often with little notice.

We would like to allow anyone to build a WordPress website–that has themes and different people with different roles, fun to go to and add to, free to create—which is also distributed in a way that is private and reliable.

We would want it to work in all browsers with no add-on’s or modifications. We would want to refer to a distributed website with a simple name like and it needs to be fast.

We would need users to be able to log in without having to have many websites know their usernames and passwords, or have a central site, like Facebook or Google, control their online credentials. In other words, we need a distributed identity system.

Additionally, we would like to have payments work in the Distributed Web. We would like to enable anyone to pay anyone else, akin to leaving a tip, or paying a suggested amount for reading an article or watching a movie. Thus people could get paid for publishing on this Distributed Web.

In addition, we would want to have saved versions of websites, and dependable archives to make this distributed websites reliable.

How can we build this system?
Way to Build the Distributed Web: an Example

Please bear with me as I to try to argue that this is possible using an amalgam of existing or near-existing technologies.

A piece of this system could be a peer-to-peer system such as Bittorrent. Storing and retrieving files in a distributed way has been commonplace for years with Bittorrent. While downloading custom software is not ideal, it shows this function can be done and done for millions of people. Bittorrent is kind of magic, where typing a long number that is a unique identifier for a file or set of files will cause it to appear on your machine. Pieces of the desired file will come from other computers that had previously retrieved those files and therefore store them on their computers. In this way, the readers of files become the servers of those files. There are millions of users of Bittorrent sharing everything from commercial movies, to free software, to library materials. The Internet Archive, for instance, offers petabytes of files to the public using the Bittorrent protocol so that users have the option to retrieve files from the Internet Archive or from other users who might be closer.

Using Bittorrent as part of the Distributed Web to share the files is working in prototype form now. Bittorrent Incorporated’s peer-to-peer powered Web browser Maelstrom is now in alpha release. With this browser, a files of files can be distributed using Bittorrent. Using this early version, I demonstrated at a conference last month a static version of my blog, being served by people around the Internet.

13-bkblogNotice in this image, that the Web address starts with bittorrent:// and then a long number. This is how the website was retrieved from the Bittorrent network.

Another system, IPFS, designed and implemented Juan Benet, is an open source and has some of the same peer-to-peer characteristics but has some added enhancements. Juan took my blog and in a few minutes put it into his system, showing that system is also working. One of the major features this system offers over Bittorrent is that updates to the blog can be discovered and distributed naturally through the system. Therefore, as people would add comments and posts to a blog, these can be retrieved without having to get a new identifier.

13.2-bkblogNotice in this case, the Web address refers to the localhost, meaning that it is retrieving the pages using a computer program running on my laptop, which is operating the peer-to-peer functionality.

Other distributed systems are in different stages of development, which will certainly be useful. Many of these systems are listed at the end of this paper.

Therefore the idea of storing and retrieving files that are part of a distributed website is now a reality in prototype form. But there are still some pieces missing.

14-javascriptBuilding Seamlessly on Top of the Existing Web

One feature that would greatly ease adoption would be to have distributed websites work seamlessly in reader’s browsers without any add-ons, plug-ins, or downloads–just click and see.

This is important because software on phones, tablets, and laptops are becoming more difficult to install without the permission of a company, such as Apple. Fortunately, it is easy to distribute JavaScript as part of Web pages, and this will likely be supported for a long time because it is important to sites such as Google Docs and Google Maps.

16-wordpressJavaScript running in users’ browsers as a kind of application platform is now possible and usable. I was surprised to find that JavaScript is now powerful enough to emulate older computers in the browser. For instance, you can now run a IBM PC emulator running MSDos 3.1 running a computer game just by clicking on a weblink to go to a webpage. The game Oregon Trail, or Prince of Persia, or old arcade games are now available on the Internet Archive and have been played by millions of people. The way this works is that others have made emulators of the underlying machines in the programming language C, and then that code cross-compiles it into JavaScript. So, when a user goes to and clicks to run it– it downloads a JavaScript program that boots an emulator of an old IBM PC or an Apple 2 in the browser. Then it reads a floppy, in this case a virtual floppy, and then runs that program in the emulator so that you are basically experiencing that old computer interface. It was a strange mind twist for me to download and run a whole machine emulator in a browser. Since JavaScript is capable enough to do that, then we can build the mechanism we need for the Distributed Web in JavaScript.

To run a distributed system in the browser, we need one more feature. The code running in the browser must be able to connect to other browsers that are running the same system. Basically we need to make it so that a browser can contact another browser instead of going to a server. This is now achievable based on a new standard, web-RTC, that was created to allow video conferencing and multiplayer games.

With the underlying speed of modern machines, the maturity of a coding system like JavaScript, and the peer-to-peer features supported in browsers, we seem to have all the pieces we need to support a Distributed Web on top of the current Web without any downloads, plug-ins, or add-ons.

There is an additional advantage to building the Distributed Web in JavaScript: it can be changed and added to by many people independently. In fact, different websites might use different Distributed Web systems all interoperably on the Internet at one time. It does not require coordination or relationships with the browser manufacturers to make changes to how the Distributed Web works. Features can be added, subtracted, and experimented with in parallel, without permission. The Distributed Web could evolve much faster than current Web technologies and yet still be interoperable.

Distributed Websites that have Search Engines and Databases

Since WordPress sites have search and database functions for selecting posts from particular months and with particular tags, to be fully functional, we need our distributed websites to have these features as well. In the current Web, programs running on a server support these features, so that when the user types a few words into a search box, it is sent to the server, and then a program runs on the server to create the page that is then transmitted back to the browser. In the Distributed Web there are no servers, there are only static files that are retrieved from a peer-to-peer network. Luckily some of the files of the website can themselves be computer code in the form of JavaScript. All of the computation then happens in the browser based on those files.

Fortunately this is possible because a search engine and the index can be packaged as files that can be downloaded to a browser and run in the browser. This feature has been achieved in the demonstrations based on Bittorrent as well as IPFS mentioned before; the programmer, Marcel van der Peijl, used an open source tool js-search to take the pages of my blog site to create an index plus search engine in JavaScript. For my site, the resulting JavaScript page was one megabyte, which is large, but not too large for broadband computer users. To make this more usable, he only loads this code after the page the user requested has been displayed, so in most cases the user would not notice the delay.

This approach will work for most blogs, but maybe the largest ones will need more sophistication.

Therefore, we can have distributed websites that include dynamic elements such as search engines and databases.

Adding New Posts and Other Changes to a Distributed Website

A key feature of a WordPress site is adding comments or posts. This is trickier in a distributed setting than in centralized systems because updates have to be made in many places. In the WordPress application we do not need the website to be up-to-the-second for every reader, but we need to propagate changes quickly.

Bittorrent has a facility called “mutable torrents” which allows updates, but currently this requires a centralized facility to keep track of the newest version. This has the disadvantage of making the user contact a central server to find the most up-to-date version. This central server could be watched or controlled by a third party.

Another peer-to-peer file sharing system, IPFS, on the other hand, has a truly distributed facility for supporting updates and versions. IPFS is a very clever system that has some of the features of a Unix file system, but one that supports versions. How this works is some of the genius of this system. Since we have seen that a distributed WordPress site can be made out of files, which may contain images and text as well as code that can then be retrieved and played in the browser, a distributed file system could hold and transmit required files.

So there are solutions, even in a distributed way, to have millions of updates and not have to resort to central control or central notification that could impact our goal of protecting reader privacy.

17-waybackmachineThe Wayback Machine of the Distributed Web

The Wayback Machine is a free service of the Internet Archive that allows people to see past versions of websites. We do this by periodically visiting a website and downloading and storing the webpages being offered.

A Wayback Machine for the Distributed Web could store versions as they are created because it is easy to recognize what has changed and store it. This way, the Wayback Machine would have some significant advantages over the current one—it could always be up to date and it could help serve the current website and past versions to users without their even knowing it. This way the user would not need to go to the website to access the Wayback Machine. It would just start serving versions of the website on request, including the current version. If it did not have those files, then it could find them from other servers to add to the archive. Therefore, the Wayback Machine would be a form of host for the current version of the website, since it would participate in offering files to the readers. The Wayback Machine would therefore make the Distributed Web more reliable.

If someone referred to a past version of a website, and if the Wayback Machine had those files, it would serve those as well. In this way, the Wayback Machine would become more tightly integrated into the Distributed Web.

Many Wayback Machines could be run by many different organizations in a smooth way. As more groups participate, the more reliable and robust this system would become.

There is another significant advantage to the Wayback Machine application in the Distributed Web: it would archive and serve fully functional websites, not just snapshots of what it looked like through time. All of the functionality would be served, so its search and database functions would be supportable forever and in past versions. This way, the distributed websites would live on in time and space even if there were a disruption in hosting or authorship.

In this way, a library, such as the Internet Archive, could preserve and provide access to websites that are no longer maintained, or where the authors have moved on to other projects. This is similar to what libraries have done with professor’s research papers—offering enduring access to past works so that people can learn from them.

Therefore the Distributed Web would have a major advantage because it could be easily archived and served in a distributed and enduring way.

Fast performance

By having institutions such as the Internet Archive offering access to distributed websites, the users will get a more reliable service, but it could also help provide better performance. Since there are other organizations that are also motivated to provide fast and reliable access for their users, others could help replicate the data and make the Distributed Web more robust. Internet Service Providers (ISP’s), for example, want their users to have a good Web experience and would be likely to serve as a close and fast host for their users. This would also help save those companies on bandwidth bills because more of their traffic would be local. In this way, there can be cultural institutions as well as commercial organizations that have incentive to replicate parts of the Distributed Web, thus increasing reliability and performance for users.

Surveillance and Censorship

Since the Distributed Web would have users and repositories all over the world, both hosted by institutions and by other readers of the Distributed Web, some of the techniques for surveillance and censorship would become more difficult. For instance, the so-called Great Firewall of China blocks access to some websites outside of China by watching all traffic on its borders and filtering based on which websites are being accessed. Since a distributed website does not have a single location it would be more difficult to monitor or block its use. Furthermore, if one copy gets behind a firewall of this kind, then it can be replicated inside, making censorship more difficult.

The encryption used in this traffic may make it difficult to even know which files are being requested in the first place. Therefore, some of the existing systems of surveillance and censorship will not be as easy to conduct in the Distributed Web.

18-blockchainEasy Names of Distributed Websites

We also want easy-to-remember names for distributed websites. When the Internet was first designed, there were IP addresses that were strings of numbers such as These were not easy to remember so a naming system was created called the Domain Name System (DNS), that allowed someone to remember names such as “” instead of an address. The Web, being built on the Internet, used these in its universal resource locators, such as

In the Distributed Web, we have a similar problem with long, hard-to-remember numbers. In the implementations described above for both Bittorrent and IPFS, a webpage is an unique, incomprehensible string such as: 88f775eea02293b407e4b22c69d387cb9bbf50b8 or /ipfs/QmavE42xtK1VovJFVTVkCR5Jdf761QWtxmvak9Zx718TVr. It would be much more convenient if we had a string such as https://brewstersblog.arc.

The domain name service could be used for this purpose and would probably be a good starting point because it would leverage a large investment in technology and investments by society in regulating who gets what names. The Distributed Web could also incorporate new naming systems that would exist alongside the DNS to support new approaches to naming and the technologies to support them.

One distributed naming system that currently exists is called Namecoin, and it is an open source system built on a Bitcoin-like Blockchain, which is in itself a distributed system. To understand Namecoin, lets start with some of the characteristics of the Blockchain technology.

The Blockchain is a form of distributed database that is used to store the ledger under Bitcoin and similar systems. It is very clever in how it maintains consistency even when none of the participants trust each other. People submit “transactions” by signing them with their private cryptographic keys, and offer a financial tip to those who compete to operate the Blockchain consistency system; they are the so-called “miners.” The Blockchain then is a way to register transactions that everyone can see and everyone agrees to. In the case of Namecoin, the Blockchain is used to register a claim for a name and the long number with which it will be associated.

In this way, people can register a name and address pair in the Blockchain and others can look it up in a distributed manner. Unfortunately looking up a name is a time-consuming process, but at least it is certain who registers a name first. Increasing performance can be another task.

Another system that could be used for this is the Distributed Hash Table, or DHT, which is central to the way Bittorrent works. This is another distributed system for looking up a name.

So if this is done correctly, we can have easy-to-remember names resolve to distributed websites quickly, securely, and privately.

Furthermore, there could be registrars that charge for new names, and in return offer services such as fast servers and permanent archives. This could be a new business model that helps support the system.

To have a distributed naming system work in current browsers, without modification, we need a way to resolve the name to an address in JavaScript without contacting the server. Fortunately, there is a mechanism to do this using an anchor tag such as

Therefore we can have a simple system for naming distributed websites without losing privacy or reliability.

Distributed Identity

To know who is allowed to update a blog, we need a system to register administrators and then to authenticate someone as being that person. That is achieved on current WordPress sites when a user creates an account with a username and password using a Web page. This is kept in a database on the server. If a similar system could be implemented with a distributed webpage that operates the database, we could make the system more secure and easier for people to use.

Another way current websites often work is one logs in using one’s Google, Facebook, or Twitter account information. This way a user does not have to give a password to many different sites, but it has the disadvantage that large corporations know a great deal about one’s behavior online.

A better system might be one that uses cryptography to allow users to create multiple account credentials and use these without necessarily tying them back to their persons. That way people would have control over who knows what about them, and if they wanted to walk away from an account, that would work as well.

This could use what is called public key encryption, which uses special math functions to create pairs of public and private keys. The private key is used to sign documents in such a way that anyone using the public key, which is publicly known, can verify that it was correctly signed. No one else can forge a document. Thus, if posts were signed on a Distributed Web, then the readers can verify that it is the particular user that has the authority to perform that action and the website never needs to know a user’s password or private keys.

19-bitcoinMaking Money by Publishing on the Distributed Web

Public-private key pairs are central to how Bitcoin works, and this fact can be useful. In Bitcoin, a public key is used as the account name such as 1KAHLE1taA85EXaVm1XuVYtbGp839MyEzB. With Bitcoin, people can create as many accounts as they want to. An account really has an effect only when someone has created a transaction using it, and thereby depositing Bitcoins into that account. Anyone can deposit money (Bitcoins) into an account, but only the holder of the private key can transfer the money out of the account to another account.

If the Distributed Web uses the same math function for creating public and private keys that Bitcoin does, then the Distributed Web’s identity system will be compatible with Bitcoin accounts. This has an interesting advantage that anyone could leave a tip for any writer on the Distributed Web because his public key would be his Bitcoin account. In this way, we could make it easy for payments, even very small ones, to be made in the Distributed Web.

I believe it would be even possible to use Bitcoin-like technology to require a payment before a reader can decode a file, say a movie. In this way, we may have a distributed way to sell digital files on the Internet without any central clearinghouse. It would still be possible to rip someone off by buying a file, decoding it, and then redistributing it, but this is true now. What would be different is that it would be easy to make micropayments and full purchases on the Distributed Web without third parties getting involved or taking a slice. Automated tipping could even be installed to try micropayments as a default behavior.

Locking the Web Open

In conclusion, through the last 25 years, people have poured their lives and dreams into the World Wide Web, yielding a library and communication tool that is unprecedented in scale. We can now build a stronger tool on top of the current Web to offer added reliability, privacy, and fun.

Our new Web would be reliable because it would be hosted in many places, and multiple versions. Also, people could even make money, so there could be extra incentive to publish in the Distributed Web.

It would be more private because it would be more difficult to monitor who is reading a particular website. Using cryptography for the identity system makes it less related to personal identity, so there is an ability to walk away without being personally targeted.

And it could be as fun as it is malleable and extendable. With no central entities to regulate the evolution of the Distributed Web, the possibilities are much broader.

Fortunately, the needed technologies are now available in JavaScript, Bitcoin, IPFS/Bittorrent, Namecoin, and others. We do not need to wait for Apple, Microsoft or Google to allow us to build this.

What we need to do now is bring together technologists, visionaries, and philanthropists to build such a system that has no central points of control. Building this as a truly open project could in itself be done in a distributed way, allowing many people and many projects to participate toward a shared goal of a Distributed Web.

20-distributedwebTogether we can lock the Web open.

We can make openness irrevocable.

We can bake the First Amendment into the code itself, for the benefit of all.

We can build this.

We can build it together.

Previous writings on this subject:

Decentralized and Distributed systems and communities: Maelstrom by Bittorrent, MaidSafe, Namecoin / Ethereum, Bitcoin for payments, Proof of Storage (blockchain), Oceanstore, I2p, IPFS, Storj, Peer5, Tahoe-LAFS, Twister, Peerjs / Web RTC, BitcoinJS,,

Posted in Uncategorized | 19 Comments