One difference between now and then, between the world before the WWW and now, is access to data. It used to be that there were not that many data sets that were online and available. Just collecting the data, or inputing the data was a task of monumental scale. But today, we have detailed maps of the United States down to the most amazing level of detail readily available. A variety of people must have worked on that project for years and years, and there must be hundreds, if not thousands, of people working to maintain accuracy of that data as it changes with time. But we do not see that and, somehow, we do not seem to have to pay for it. Someone does, but not us.
It used to be that certain groups would
maintain their datasets and you could get a copy by sending them a
tape. Astronomy catalogs are a famous example. We all used the
YBSC (Yale Bright Star Catalog) because that is what was readily
available. Now there are many catalogs available, easily, on the
Internet. Of course, that does not mean it is all that easy to
interpret that data, but that is a different (if related) problem.
By accident, while my life is
collapsing around my head, and the future looks as dark as it ever
has, I was wasting time as I usually do and came across a remarkably
useful collection of datasets. It seems to be associated with the
Introduction to Computer Science course at Princeton University.
You can find that dataset in its
original form here.
War and Peace, what we normally call
"the Bible", the works of Dickens, the Official Scrabble Dictionary, a few human chromosomes,
the ranked list of last names in America, all the locations of
Wendy's burgers in the world, and so forth. The Communist Manifesto.
Its all here.
The table of contents, in image form, is:
The table of contents, in image form, is:
Now you will never need data again.
I plan to download all these and make a giant tar available on the internet of all of them. Why? Because I believe that everything that exists on the WWW will go away, one day, whether we know it or not.
I plan to download all these and make a giant tar available on the internet of all of them. Why? Because I believe that everything that exists on the WWW will go away, one day, whether we know it or not.
No comments:
Post a Comment