Fri 5 Dec 2008
Amazon Web Services is making a variety of large data sets available in the cloud. This is great news, as these giant data sets are often difficult to find, compile, and host.
So far the list of data sets includes some biological and chemical data, census info. and labor reports. I'd love to see this list grow to include the complete history of the GSS, for example. In another area, Amazon should keep a complete, unpacked, current dump of Wikipedia in the cloud. The complete XML dump of the English language Wikipedia with all revisions is in the 10s of terabytes, I think.
