I never used the tools of Big Data and artificial intelligence to build profiles of masses of people. And on purpose. I realized what could be done– the power in knowing too much, the temptation to manipulate. But now others have crossed that line. Total Information Awareness; Facebook, Equifax. Now we see the problems, now we have a problem, and we need ideas on where to go from here.
I started collecting databases in the early 1980s at the Artificial Intelligence Lab at MIT and at Thinking Machines in the 1980s. I knew what could be done as we built supercomputers.
The artist David Byrne wrote, “In the Future, there will be so much going on, no one will be able to keep track of it.” His statement haunted me in the early 1980s, I felt it was importantly wrong. “No one” person would could keep track, but computers could. I read a book in the late 1970’s about how computers were used in South African apartheid. It turns out it was very simple record keeping computer, a couple of mainframes from England (IBM did not want to sell to them) was all it took to keep a population of millions under control.
I spent time at library school in the 1980’s and studying the information habits of high priced consultants at KPMG Peat Marwick. I wanted to know what people asked about. Often it was as simple as “people, products, companies”. I was up for helping with the products and companies, but not the people.
When we released the first Internet publishing system’s software, Wide Area Information Servers, into the public domain in April of 1992, I included an essay with every copy titled “The Ethics of Digital Librarianship.” A kind of guide, a warning, to those that would start to accumulate information on users in the form of usage logs– unwitting traces of what others were thinking about. Usage logs give intimate insights into individuals, and the Internet addresses they come from can allow a user to know whose thoughts they are.
Some think I came up with the term “Big Data” in the early 1990’s, I could have been the one, it is hard to know. But I certainly promoted it. If I did coin it, then where it came from playing with Laurie Anderson’s song title “Big Science.” In that song, released in 1982, Laurie warns of
“Every man, every man for himself
All in favor say aye
Big Science. Hallelujah.”
“Big data” came after databases and before data mining. Profiling, redlining, targeting. Data analytics. I knew that big data unleashed by libertarian free-for-all thought ( “what can be built must be built” type of thing), we could get mass manipulation by those that could afford it, those that wanted it: Intelligence agencies, corporations, and those wishing to influence elections.
There would be those that would cross the line to build profiles on a massive scale– Facebook, Cambridge Analytica– but I was not going to join them. I was not going to use these powerful tools for this.
As we have built large datasets at the Internet Archive, there are some we can let anyone do anything with, but most allow aggregation of information about people, things that make us uncomfortable letting anyone do just do anything with it. Do we need an ethics board to sift proposals as they do in medical studies? People are sold that there is no “personally identifiable information” in these large datasets, even after they are “anonymized.” Maybe they want to believe it. But it is almost certainly not true.
I don’t know of a technical way to keep us from making databases of profiles, or restraining manipulation. We need to keep ourselves back from those edges. But those edges are justified by many with money, with calls for “security” and other righteous causes.
We have the tools of Big Data now, will we keep our humanity? I am still looking for answers, and how we can help. Any ideas?
The IA’s work on decentralization seems promising.
This talk has some specific ideas: http://idlewords.com/talks/what_happens_next_will_amaze_you.htm
*scroll down to the six fixes section if you’re in a hurry.
And a few persons, like Leo Szilard and Norbert Wiener, kept their humanity while working with technologies no less troubling. Maybe we can keep them in mind.
A huge part of the solution will be “defusing” data. You don’t need to get my DNA to get a lot of information on my genetic make-up… any close relative of mine who’s willing to swap some dollars for their own information helps to make that possible. So… make knowing anyone’s genetic make-up less valuable, e.g., through universally-provided health care. That won’t stop someone from being curious as to your genes, but employers have far less reason to care, and that’s one of the big reasons to be concerned today, when you get red-lined for your potential health care liability. You probably can’t stop the sharing and aggregation of data (and I say this as one of your ex-intelligence officer friends 🙂 , but you can reduce the harm that having it could cause.